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ABSTRACT 


An  on-line,  general-purpose,  fact- retrieval  system  is  presented 
which  employs  a  classificatory  data  structuring  technique.  The 
technique  embraces  the  basic  concept  of  hierarchical  classification 
of  data  and  provides  users  with  multiple  avenues  of  access  to  a  data 
file.  Additionally,  the  data  file  may  be  partitioned  into  unrelated 
data  sets. 
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I.  INTRODUCTION 


The  term  "information  retrieval"  and  the  initials  "IR"  were 
coined  by  the  editors  of  Fortune  about  ten  years  ago.  However, 
Vannevar  Bush  first  formally  declared  the  necessity  for  an  informa¬ 
tion  retrieval  discipline  in  his  "As  We  May  Think"  article  which  was 
written  for  Atlantic  Monthly  in  1946.  The  United  States  Government 
and  those  people  involved  in  Library  Science  were  truly  the  first  in¬ 
novators  of  this  discipline  in  the  mid-fifties.  The  technological  ex¬ 
plosion  being  felt  at  that  time  prompted  government  agencies  and 
library  scientists  to  search  for  more  efficient  systems  for  indexing, 
storing,  and  retrieving  documents.  Primary  concern  was  the  assur¬ 
ance  that  vital  technical  information  would  be  available  to  all  possible 
users.  The  discipline  of  information  retrieval  as  we  know  it  today 
emerged  as  a  result  of  this  technological  explosion. 

Information  retrieval  has  been  defined  in  numerous  ways.  How¬ 
ever,  all  definitions  share  a  common  point  which  is  be^t  stated  by 
Taube  [Ref.  l]  as:  "The  right  information  made  available  to  the  right 
person  at  the  right  time.  "  Bourne  [Ref.  Z]  states  that  "Information 
retrieval  has  become  a  generic  term,  firmly  established  through 
common  usage,  which  includes  reference,  fact,  and  document  retrie¬ 
val.  "  Bourne  also  differentiates  between  data  processing  and  informa¬ 
tion  retrie\al.  The  former  includes  the  manipulation,  replacement, 
alteration,  or  addition  to  the  data  on  file  while  the  later  ia  concerned 
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with  the  storage  of  data  in  unaltered  form  for  later  re-use.  Use  of 
the  term  '“information  retrieval"  in  this  paper  implies  the  generic 
meaning  stated  by  Bourne. 

This  paper  is  devoted  to  the  investigation  of  a  data  structuring 
concept  proposed  by  Kildall  [Ref.  3]  for  use  in  a  general-purpose 
fact-retrieval  system.  Before  investigating  Kildall's  proposal  in 
section  VI,  the  techniques  of  indexing,  storage,  and  retrieval  estab¬ 
lished  for  Library  Science  purposes  will  be  reviewed.  These  basic 
techniques  form  a  foundation  for  the  design  of  specific  IR  systems. 

Information  retrieval  is  divided  into  three  major  operatives: 

1.  Indexing  (classification,  description,  and  structuring  of 
information  sources). 

2.  Storage  (organization  and  storage  of  files). 

3.  Retrieval  (searching  and  displaying  information). 

Figure  1  is  a  simplified  diagram  which  illustrates  a  typical 

infc  mation  retrieval  process.  An  index  is  constructed  which  de¬ 
scribes  the  information  source  (document  or  record)  and  is  stored  in 
a  file  along  with  the  source  itself.  A  request  for  ixiformation  (query) 
is  directed  to  the  index  file  where  the  location  of  the  requested  docu¬ 
ment  within  the  information  file  is  found.  A  search  of  the  information 
file  then  results  in  the  retrieval  of  the  document.  This  process  is 
analogous  to  the  indexing  and  storing  of  new  books  received  in  a 
library,  and  the  search  for  information  by  a  library  patron. 
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Figure  1 

Basic  Flow  Diagram  of  the  Information  Retrieval  Process 
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II.  INDEXING 


Indexing  is  the  classification,  description,  and  structuring  of 
information  in  such  a  manner  that  retrieval  of  the  information  is 
accomplished  expeditiously.  This  task  is  performed  on  information 
sources  suck  as  books,  documents,  and  files  and  is  an  integral  part 
of  the  information  retrieval  process.  Since  retrieval  is  the  counter¬ 
part  of  indexing,  the  indexing  and  retrieval  schemes  used  in  an  IR 
system  must  be  compatible  in  order  for  a  user  to  communicate  with 
the  system.  Clearly,  retrieval  efficiency  (i.e.,  ease  and  speed  of 
retrieving  desired  information  with  a  minimum  of  false  drops is 
related  to  the  efficiency  and  consistency  of  the  indexing  process. 

As  a  rule,  the  information  base  of  an  IR  system  is  specialized 
and  as  such  requires  a  professional  jargon.  Ideally,  the  indexer  and 
system  user  are  experts  in  this  professional  language.  However,  this 
may  not  necessarily  be  true  and  causes  a  problem  commonly  confronted 
by  IR  system  designers.  The  problem  is  how  to  structure  specialized 
data  for  input  to  the  system  in  a  manner  that  is  convenient  to  both  the 
indexer  and  user  while  maintaining  data  accessibility.  An  example  of 
an  indexing  language  is  the  Dewey  Decimal  System  used  for  indexing 
library  books. 

Selection  of  an  indering  language  is  based  upon  the  following 
considerations: 

^Output  of  irrelevant  information  as  a  result  of  a  retrieval 
request  is  called  a  "false  drop.  " 


1.  The  language  should  be  convenient  to  use,  such  as  natural 
language  or  a  language  that  could  be  easily  learned. 

!  2.  Coniputerized  systems  require  that  the  language  be  rigid 

enough  to  be  usable  in  the  machine  but  must  also  remain  convenient 
for  human  utility. 

3.  The  vocabulary  should  be  broad  enough  to  allow  accurate 
description  of  the  information. 

4.  The  language  should  be  flexible  enough  to  allow  modification 
as  changes  in  information  occur. 

There  are  numerous  indexing  languages  in  use  today  each 
tailored  to  suit  specific  usage  of  the  IR  system.  Therefore,  indexing 
languages  normally  reject  the  viewpoint  of  the  system  designer  in 
his  attempt  to  organize  the  system's  data  base  to  best  suit  the  needs 
of  the  user.  Several  indexing  techniques  which  evolved  from  Library 
Science  will  be  reviewed  in  the  sections  that  follow.  These  techniques 
appear  to  form  the  nucleus  from  which  specialized  systems  are  formed. 
Although  the  techniques  are  primarily  oriented  toward  document  in¬ 
dexing,  variations  are  used  in  all  types  of  IR  systems.  The  techniques 
are  presented  in  ascending  order  of: 

1.  Effort  on  the  part  of  the  indexer. 

2.  Difficulty  in  automating. 

3.  Indexing  power 

4.  Retrieval  efficiency. 

A.  UNIT -TERM  INDEXING 

The  simplest  indexing  technique  involves  the  extraction  of 
descriptive  words  from  the  information  source.  The  source  is  then 
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associated  with  each  of  the  terms  used  ti  describe  its  content.  In 
the  case  of  a  library  book,  or  other  document,  descriptive  words 
may  be  taker,  from  the  title,  abstract,  or  the  text  itself.  This  tech¬ 
nique  requires  a  minimum  of  effort  (other  than  reading  the  source) 
on  the  part  of  the  indexer.  In  addition,  the  indexing  is  accomplished 
rather  quickly  since  the  indexer  need  not  be  ultimately  familiar  with 
the  subject  material.  Unit-term  indexing  is  particularly  advantageous 
when  no  information  is  available  on  the  spread  of  subject  material. 

The  addition  of  new  material  to  the  data  base  is  easily  accomplished 
by  expanding  the  vocabulary  (unit-terms)  to  include  new  descriptive 
words.  However,  unit-term  indexing  lacks  rules  for  combining  terms 
into  units  which  have  meaning.  This  shortcoming  causes  indexing 
problems  when  synonyms,  plural  word  forms,  and  generically  related 
terms  are  encountered  in  the  source  document. 

The  search  device  used  in  such  a  system  is  an  alphabetical 
listing  (indexing  record)  of  the  key  n ords  used  by  the  indexer-  In 
general,  the  information  source  is  listed  with  each  key  word  and 
is  used  as  a  source  descriptor,  or  the  listing  may  indicate  the  location 
of  the  source,  or  both.  It  is  possible  that  the  user  will  have  difficulty 
in  using  this  system  unless  he  knows  precisely  the  topic  that  he  is 
searching  for.  An  analogy  may  be  drawn  to  searching  the  telephone 
book  for  a  name  when  the  spelling  of  the  name  is  not  known.  There¬ 
fore,  this  indexing  scheme  is  often  utilized  in  IR  systems  where  the 
user  is  familiar  with  the  professional  jargon  contained  in  the 
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iiiformaKon  sources  (e,g. ,  technical  libraries). 

2 

An  excellent  sample  of  this  subject -indexing  technique  is  the 
Uniterm  Coordinate  Indexing  System  which  dates  back  to  1952.  The 
Uniterm  ("unit-term")  System  includes  fifteen  rules  governing  the 
indexer's  operation,  rules  for  determining  key  words,  methods  for 
processing  word  meanings,  and  cross-referencing  techniques.  Some 
agencies  using  this  system  have  drafted  standard  unit-terms  (key 
words)  to  be  used  by  indexers.  However,  this  is  unnecessary  for  an 
unstructured  language  since  new  unit-terms  may  be  added  without 
perturbing  the  existing  system.  An  example  of  an  index  that  might 
be  constructed  from  a  Uniterm  System  is  shown  below.  The  numbers 
below  the  unit-terms  might  represent  reference  serial  numbers,  or 
library  call  numbers. 

ABLATION 

452  573  772 

ADSORPTION 

137  459  823  1201 

ADHESIVE 

491 

AERODYNAMIC 

139  241  242  357  552  1010  1168 


"Subject  indexing,  "  "keyword  indexing,  "  and  "coordinate 
indexing"  are  terms  commonly  used  to  describe  the  technique  presented 
here. 
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B.  KEY-WORD-IN-CONTEXT  INDEXING 


,  Another  very  common  subject  indexing  technique  is  called  "Key- 

i 

Word-In-Context"  (KWIC)  indexing^.  The  indexing  power  of  KWIC 
is  very  slightly  greater  than  the  simplest  of  subject  indexing  techniques 
since  the  key  word  is  shown  in  the  context  of  the  entire  subject.  There 
are  several  variations  in  KWIC  format  but  essentially  it  is  an  alphabet¬ 
ical  listing  of  key  words.  Whole  phrases  are  extracted  from  the 
source  so  that  a  user  can  easily  determine  the  role  of  the  key  word. 

The  distinguishing  feature  of  KWIC  is  its  display  format  shown  in  the 
example  below.  Let  us  suppose  that  the  title  of  a  source  document 
is:  "Principles  of  Automated  Information  Retrieval.  "  Assuming  that 
the  indexer  selects  four  key  words  to  describe  the  source,  the  KWIC 
index  would  appear  as : 

"5135  Principles  of  AUTOMATED  Information  Retrieval 
iples  of  Automated  INFORMATION  Retrieval  5135  Princ 
ion  Retrieval  5135  PRINCIPLES  of  Automated  Informat 
omated  Inforn^ation  RETRIEVAL  5135  Principles  of  Aut" 

Note  that  "automated",  "information",  "principles", and  "retrieval" 
are  individual  key  words.  A  user  desiring  this  source  document 
could  find  it  by  using  any  one  of  the  four  key  words.  Note  also  that  a 
user  may  find  this  system  easier  to  use  than  the  Uniterm  System  if  he 
is  unfamiliar  with  the  subject  material. 

_ 

Also  referred  to  as  "permuted"  or  "permuted  title"  indexing. 
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C,  THESAURUS 

Indexing  power  may  be  increased  further  by  determining  generic 
relationships  between  key  words.  The  Armed  Forces  Information 
Agency  (ASTIA)  and  the  Defense  Documentation  Center  (DDC)  have 
produced  thesauri  which  are  alphabetical  lists  of  indexing  terms  with 
related  terms  and  "see"  references.  These  lists  are  used  by  indexers 
as  means  of  standardizing  their  operation.  In  other  words,  indexers 
describe  similar  information  sources  in  consistent  fashion.  These 
thesauri  define  some  hierarchy  in  key  words  and  are  useful  to  the 
user  as  well  as  indexer  since  they  allow  the  user  to  formulate  queries 
with  the  exact  terms  used  by  the  indexer.  An  example  of  a  thesaurus 
borrowed  from  Meadow  [Ref.  4]  is  exhibited  below. 

COMPUTERS 

(Computers  and  Data  Systems) 

Includes: 

Calculating  machines 
Generic  to: 

ANALOG  COMPUTERS 
ANALOG -DIGITAL  COMPUTERS 
BOMBING  COMPUTERS 


Also  see: 
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DATA  PROCESSING  SYSTEMS 


SIMULATION 

Computing  gun  sights  use  GUN  SIGHTS 

D.  HIERARCHICAL  CLASSIFICATIO  I 

Probably  the  most  widely  used  indexing  technique  is  that  of 
hierarchical  classification  where  a  universe  of  information  is  repeated¬ 
ly  divided  and  sub-divided  into  a  classificatory  tree.  This  index 
language  has  a  very  tightly  controlled  but  simple  vocabulary  contained 
in  an  authority  list  of  key  words  provided  with  the  classification  system. 
Each  key  word  in  the  authority  list  is  assigned  a  numeric  or  alphanumeric 
code  (nmemonic  codes  could  be  used  but  normally  are  not).  As  can  be 
seen  in  the  tree  structure  exhibited  below,  a  key  word  contains  all 
those  key  words  generic  to  it  (i.  e.  ,  above  it  in  the  branch  of  the  tree 
from  which  it  was  derived).  Hierarchical  schemes  allow  the  indexer 
to  describe  an  information  source  in  generic  levels  so  that  the  user 
may  formulate  his  query  in  more  general  or  more  specific  terms  by 
moving  up  or  down  the  classification  tree. 

Modification  of  key  wore  meaning  is  difficult  to  accomplish 
since  changing  one  word  in  the  tree  affects  all  key  word.**  generic  to 

it.  However,  changes  at  the  bottom  of  the  tree  are  easily  made  since 
no  perturbation  of  the  tree  occurs.  Expansion  of  the  vocabulary  used 
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I 

i 

in  this  sytem  is  readily  accomplished  by  expanding  the  tree  horizon¬ 
tally. 

! 

The  most  well  known  hierarchical  systems  are  the  Dewey 
Decimal  Classification  System  (exhibited  below),  the  Library  of 
Congress  System,  and  the  Universal  Decimal  Classification  System. 


500 

Pure  Science 

510 

Mathematics 

519 

Probabilities  and  Statistical  Mathematics 

519.9 

Treatment  of  Data 

519.  92 

Programming  (linear  and  dynamic) 

E,  FACETED  INDEXING 

In  the  immediately  preceding  section  a  classification  technique 
was  presented  which  structures  a  topic  (universe  of  information)  by 

dividing  and  subdividing  it  to  form  a  classificatory  tree.  Faceted 
indexing  deals  with  individual  key  words  taken  from  the  data  source 
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and  grouped  into  categories  with  respect  to  their  usage  within  the 
source.  Terms  within  each  group  are  structured  into  a  classificatory 
tree.  A  term  extracted  from  the  source  is  analyzed  from  several 
points  of  view  and  a  group  of  indexing  terms  are  synthesized  to  de¬ 
scribe  the  key  word  in  context.  This  techniqu  ;  is  referred  to  as 
"facet  analysis,  "  "faceted  indexing,  "  and  "relational  indexing"  where 
each  key  word's  point-of-view-analysis  is  called  a  facet. 

An  excellent  example  of  faceted  indexing  is  given  by  Meadow 
[Ref.  4].  Let  us  suppose  that  "steel"  is  a  key  word  taken  from  a 
source  document.  The  document  contains  information  relating  to  the 
manufacture,  use,  chemical  analysis,  and  properties  of  steel.  By 
appending  descriptors  to  the  key  word  "steel"  the  following  in  cx 
terms  are  created: 

STEEL,  manufacture  of 
STEEL,  use  in  automobiles 


These  index  terms  are  not  predefined  in  any  authority  list  but 
are  constructed  by  the  indexer  by  appending  descriptors  to  che  key 
word.  The  terms  follow  some  syntactic  rule  such  as:  subject  followed 
by  moi.  fier,  followed  by  operation  modifier.  The  utility  of  this 
technique  is  that  the  indexer,  armed  with  a  descriptor  list  and  syn¬ 
tactic  rules  tailored  to  suit  the  particular  IR  system,  may  analyze 
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a  source  from  many  points  of  view  and  construct  index  terms  that 
describe  the  information  content  in  gx'eat  detail. 

F.  AUTOMATIC  INDEXING 

In  the  foregoing  discussions,  it  was  assumed  that  the  indexer 
was  human.'  A  treatment  of  automatic  (computer)  indexing  is  now  in 
order. 

Automatic  indexing  is  difficult  to  accomplish  for  two  main  rea¬ 
sons.  First,  the  information  source  must  be  in  machine  readable 
form.  In  the  case  of  books  or  other  lengthy  documents  this  is  a  very 
expensive  requirement.  However,  ucvelopment  of  character  recog¬ 
nition  devices  and  the  production  of  transcripts  in  machine  cede  as  a 
by-product  of  automatic  typesetting  havr  eased  the  cost  of  this  require¬ 
ment.  The  second  problem,  and  the  more  serious,  is  the  development 
of  algorithms  or  heuristics  which  derive  meaning  from  string^  of 
characters.  This  is  an  area  of  Artificial  Intelligence  in  which  a  good 
deal  of  research  has  been  expended.  However,  the  results  of  this 
research  have  been  empirical  since  we  lack  sophisticated  linguistic 
and  semantic  knowledge.  References  5,  6,  7,  and  3  contain  excellent 
treatments  of  the  research  conducted  and  problems  involved  in  machine 
translation  of  natural  language  while  ref.  9  contains  a  comparison  of 
manual  and  automatic  indexi'3g  techniques. 

There  is  an  automatic  indexing  technique  in  commercial  use 
today;  however,  it  is  a  "brute  force"  adaptation  of  KWIC.  Basically. 
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the  technique  produces  index  key  words  by  comparing  wordc  from  the 
source  to  words  stored  in  an  authority  list.  There  are  many  limita- 

t 

tions  to  this  system  such  as  correct  handling  of  hyphenated  words, 
plural  forms,  and  proper  nouns  but  the  primary  limitation  is  that  the 
list  must  contain  a  sufficient  number  of  appropriate  words  in  order 
for  a  source  to  be  adequately  indexed.  The  size,  speed,  and  complex¬ 
ity  of  such  a  system  should  be  obvious. 

Referring  to  figure  1  it  is  seen  that  the  indexing  process  produces 
index  records.  The  contents  of  the  records  vary  widely  and  are  de¬ 
pendent  upon  the  type  of  IR  system  (e.g.  ,  document,  fact,  or  reference). 
In  addition  to  subject  descriptors,  tha  index  may  contain  the  location 
of  the  information,  source,  author,  reference  to  another  index  record, 
or  other  information  deemed  pertinent  by  the  system  designer.  It  will 
also  be  noted  from  the  figure  that  the  information  source,  or  informa¬ 
tion  concerning  the  source,  will  also  be  stored  in  the  IR  system.  In 
the  case  of  a  large  document  such  as  a  book,  it  probably  will  not  be 
stored  in  the  computer  but  rather  a  reference  or  abstract  will  be 
stored  as  a  substitute.  In  some  cases,  th<:  index  record  itself  will 
contain  all  of  the  information  asso<’‘iated  with  an  information  source. 

For  e^:ample,  an  index  record  for  a  library  book  may  contain  the 
book’s  loca'.ion  within  the  library,  therefore,  the  system  will  present 
the  index  record  itself  in  anrwer  to  a  user's  que’  . 
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m.  STORAGE 


This  section  of  the  paper  contains  descriptions  of  various 
techniques  used  for  organizing  index  and  information  files  within  an 
IR  System's  storage  media.  There  will  be  no  discussion  of  storage 
devices  since  it  is  assumed  that  the  reader  is  already  familiar  with 
con^uter  equipment.  The  reader  is  aware,  of  course,  that  the 
system's  capacity,  cost,  and  response  time  are  greatly  affected  by 
the  selection  of  various  storage  media. 

A.  FILE  ORGANIZATION 

Organization  of  an  index  file  or  information  file  specifies  the 
positioning  of  the  records  in  relation  to  one  another  within  the  file 
along  with  the  physical  position  of  the  file  within  the  storage  media. 
Choice  of  a  rule  which  governs  file  organization  is  dependent  upon 
desired  response  time,  peak  retrieval  loads,  system  reliability^, 
category  of  users,  cost,  rate  of  information  change,  rate  of  system 
growth,  and  type  of  storage  media.  There  are  several  rules  for  file 
organization  which  are  extensively  used  in  IR  systems  and  they  are 
presented  here.  These  rules  are  equally  applicable  to  index  and  in¬ 
formation  files. 

1.  Sequential  Organization 

The  first  method  involves  the  sequential  placement  of 
records  within  a  file.  The  (i+1)**  record  follows  (physically  and/or 

5 

Ability  to  retrieve  a  maximum  of  information  with  a  minimum  of 
false  drops. 
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logically)  the  record.  For  example,  the  alphabetical  listing  of 
subject-indexing  key  words,  alphabetical  arrangement  of  employee 
records,  etc.  This  method  is  very  conservative  of  memory  space 
since  there  is  no  need  to  supply  pointers  or  links  to  indicate  where 
the  next  record  in  the  file  is  located.  On  the  other  hand,  additions  or 
deletions  to  the  file  are  difficiilt  to  make.  Let  us  suppose  that  we 
desire  to  add  a  new  name  to  the  telephone  book.  Then  all  of  the  names 
which  follow  the  inserted  name  must  be  moved.  Likewise,  the  deletion 
of  a  name  results  in  perturbation  of  the  list.  This  type  of  organization 
is  most  commonly  used  with  magnetic  tape  where  records  are  searched 
sequentially. 

2.  Chaining 

Another  technique  of  file  organization  is  called  "chaining" 
where  addresses  (links,  chains,  or  pointers)  are  stored  in  one  or 
more  fields  of  a  record  to  indicate  the  location  of  the  next  record 
within  the  file.  Recall  from  the  discussion  of  indexing  that  thesauri 
contain  "see"  references.  These  references  are  links  which  convey 
the  idea  of  chaining.  Chaining  is  a  particularly  effective  method  when 
used  in  a  crowded  memory  since  "referred  to"  records  may  be  placed 
in  any  available  space  within  the  memory  (unlike  the  rigid  sequential 
scheme).  Also,  the  utility  of  chaining  is  fully  realized  in  a  system 
which  experiences  a  high  rate  of  information  change.  This  method 
requires  more  memory  space  than  the  sequential  scheme  since  extra 
fields  must  be  appended  to  the  records  to  accommodate  the  links. 
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a.  Branching 

I  An  extension  of  the  chaining  technique  is  referred  to  as  a 

'Tbranching  structure.  "  Branching  is  used  to  achieve  versatility  in 
changing  record  entries,  changing  file  structures,  and  conversion, 
where  possible,  of  variable -length  records  to  fixed-length  records. 

A  trivial  example  is  shown  in  Figure  2.  which  exhibits  the  idea  of 
branching  file  structures. 

Let  us  suppose  that  our  file  consists  of  all  military  flying  clubs 
in  the  United  States.  Each  record  consists  of  the  club's  name,  address 
(airport,  city,  state),  membership,  and  type  of  aircraft.  Obviously, 
these  records  are  variable -length  because  the  number  of  aircraft 
owned  by  each  club  is  variable.  The  main  file  may  be  converted  to 
fixed-length  records  by  replacing  the  aircraft  type  fields  with  a  single 
address.  The  aircraft  types  could  then  be  included  in  another  fixed- 
length  file.  The  address  in  the  main  record  links  to  an  address  file 
which  in  turn  points  to  the  file  containing  the  aircraft  types.  Repetition 
of  aircraft  type  is  r  liminated  from  the  main  records,  main  records 
are  fixed-length,  and  changes  are  made  only  to  the  address  file  not 
the  main  file  or  aircraft  file. 

Figure  3  exhibits  another  feature  of  this  technique  which  replaces 
all  field  entries  in  the  main  file  (except  the  name)  with  addresses.  If 
it  is  later  decided  to  add  "county"  to  "city"  and  "state"  then  no  changes 

are  required  in  the  main  file  but  a  field  must  be  added  to  each  of  the 
"city-state"  file  records  to  absorb  the  new  addition. 
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Figure  2 

Conversion  of  Variable-Length  Records  to  Fixed-Length  Records 
using  the  Branching  Technique. 
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Figure  3 

Addition  of  Records  to  an  Existing  Branching  Structure 


3.  List  Structuring 


^  Although  chaining  and  branching  allow  records  to  be  scattered 

throughout  memory,  their  membership  in  a  particular  file  is  main¬ 
tained  by  some  order  of  relative  placement  (e.  g.  ,  employee  records 
logically  linked  in  alphabetical  order  but  physically  scattered  through¬ 
out  the  file).  List  structuring  does  not  require  that  records  be  ordered 
in  any  specific  manner  within  a  file.  Further,  the  fields  of  a  record 
may  be  physically  separated  and  then  linked  to  form  a  logical  record. 
The  advantage  of  this  form  of  storage  is  the  freedom  of  changing  field 
content  structure,  record  content,  and  file  structure.  However,  this 
method  requires  a  great  deal  more  memory  space  than  any  other 
technique.  In  addition,  the  retrieval  process  is  relatively  slow  since 
more  time  is  required  to  gather  the  elements  of  a  record  together. 

The  three  techniques  of  file  organization  described  above  are  all 
forms  of  list  structuring  and  each  demonstrates  a  different  degree  of 
structural  freedom.  Chaining  requires  that  fields  remain  contiguous, 
but  records,  while  remaining  ordered,  may  be  physically  separated. 
Branching  is  an  extension  of  chaining  allowing  fields  to  contain  address 
linkages  to  other  fields.  The  last  method  allows  any  ordering  and 
structuring  of  fields  and  records. 

B.  FILE  SEQUENCING 

It  is  important  that  records  be  sequenced  (sorted)  in  some 
manner  for  use  in  IR  systems.  Sequencing  is  normally  based  on 
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some  particular  attribute  of  a  record  (called  a  sort  key)  such  as  the 
"name"  field  of  an  employee  record.  Selection  of  the  sort  key  is 
based  on  many  considerations  but  the  objective  is  to  select  the  same 
sort  key  as  may  be  used  in  a  retrieval  request.  Subordinate  sort  keys 
may  also  be  chosen  when  more  than  one  record  has  the  same  primary 
sort  key  value  (e.g.  ,  several  employees  with  the  same  last  name). 
Searching  records  which  are  ordered  on  the  primary  sort  key  is  then 
called  an  "ordered  search.  " 
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IV.  RETRIEVAL 


The  retrieval  process  essentially  consists  of  searching  the  index 
files  and  information  files  for  information  which  satisfies  a  user's 
query.  If  the  information  is  found,  it  is  sent  to  the  user,  if  not,  the 
user  is  so  informed.  It  should  be  noted  that  "searching" and  "retrieval" 
are  not  synonymous.  "Searching"  is  a  file  access  operation  used  to 
locate  records  for  matching  against  the  query,  while  "retrieval"  is 
the  actual  output  of  information  which  satisfies  the  query.  However, 
use  of  the  word  "retrieval"  here  will  imply  the  entire  operation  of 
searching  and  retrieval. 

As  previously  discussed  in  section  II,  indexing  and  retrieval  are 
counterparts  since  indexing  refers  to  the  structure  of  information  for 
input  to  the  files,  while  retrieval  is  the  process  of  locating  and  dis¬ 
playing  desired  information.  Therefore,  the  query  language  employed 
by  the  system  user  must  be  compatible  with  the  index  language  em¬ 
ployed  by  the  system  designer.  It  is  important  that  the  query  and 
index  languages  use  the  same  vocabulary  in  order  for  the  IR  system 
to  understand  the  user's  requests.  The  user  must  also  be  familiar 
with  the  system's  logic  in  order  to  formulate  an  intelligent  query.  He 
must  know  if  the  system  honors  the  use  of  Boolean  relationships 
("and,  "  "or,  "  "not")  and  magnitude  comparators  ("greater  than,  " 

"less  than,  "  etc.  )  as  query  terms. 
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Once  the  query  is  formulated  it  is  input  to  the  system's  index 
file^.  A  matching  process  takes  place  at  the  index  file  where  the  terms 
used  in  the  query  are  matched  against  the  index  file  records.  Index 
records  which  match  the  terms  of  the  query  are  employed  as  locators 
to  direct  the  retrieval  of  data  from  the  information  file. 

The  technique  used  in  searching  the  index  and  information  files 
is  governed  by  the  file  organization  (structure,  sequencing,  content, 
and  storage  medium).  In  the  ensuing  discussion  of  search  techniques 
it  should  be  borne  in  mind  that  whatever  technique  is  used  it  is  fixed 
within  the  IR  system.  Also,  the  interrelationship  between  search  plan 
and  file  organization  may  limit  file  accessibility  and  search  flexibility. 

A.  FULL- FILE  SEARCH 

One  search  plan  incorporates  a  full-file  search  where  every 
record  of  the  file  is  txiatched  (e.  g. ,  the  value  of  the  query  term  is 
matched  against  the  value  of  the  sort  key).  This  plan  is  used  when 
the  order  of  records  within  a  file  is  unknown  (e.g. ,  a  file  of  employee 
records  that  are  not  alphabetically  sorted).  In  this  case,  if  we  were 
searching  for  Doe's  record  and  found  Smith's  it  does  not  follow  that 
wc  have  searched  too  far  since  the  records  are  not  collated.  In  ad¬ 
dition,  there  may  not  be  any  assurance  that  a  single  match  satisfies 
the  search  (more  than  one  Doe  in  the  file).  Therefore,  all  records 
within  a  file  must  be  searched. 
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B.  SEQUENTIAL  SEARCH 


A  sequential  search  plan  m.t;ht  be  used  when  the  records  are  not 
only  sequenced  but  sequenced  on  the  same  term  as  is  used  in  the  query. 
Sequential  searches  are  normally  used  in  conjunction  with  sequential 
access  type  storage  devices.  The  records  of  a  file  are  matched  se¬ 
quentially  until  a  successful  match  is  made  or  when  the  value  of  the 
query  term  exceeds  the  value  of  the  sort  key.  In  this  case,  searching 
for  Doe's  record  and  locating  Smith's  record  indicates  that  the  search 
has  not  only  gone  too  far  but  no  successful  retrieval  will  be  made 
since  there  is  no  Doe  in  the  file. 

C.  BINARY  SEARCH 

A  binary  search  plan  may  also  be  used  with  a  sequenced  file. 

The  term  "binary"  implies  *'  at  a  two  valued  decision  is  made  after 
every  match  attempt.  The  search  begins  in  the  middle  of  the  file.  If 
the  first  match  attempt  is  unsuccessful  then  the  next  attempt  is  made 
one-quarter  file  length  away  from  the  first.  The  direction  of  the  sub¬ 
sequent  search  is  dependent  upon  the  result  of  comparing  the  value  of 
the  query  term  and  the  sort  key  (e.  g.  ,  if  the  sort  key  is  greater  than 
the  query  term  then  move  one-quarter  file  toward  the  beginning  of  the 
file).  Each  successive  move  is  then  made  ot  e-half  the  length  of  the 
preceding  move.  If  there  are  n  records  in  the  file  then  there  will  be 
approximately  log^n  moves  to  exhaust  the  file. 
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D.  DIRECT  ACCESS  SEARCHING 

The  last  file  searching  technique  relies  upon  a  special  type  of 
index  file  called  an  inverted  index.  This  is  probably  the  most  common 
type  of  index  file  used  in  IR  systems.  The  inverted  file  records  con¬ 
sist  of  the  descriptors  produced  during  the  indexing  process.  The 
descriptors  are  used  as  sort  keys  for  sequencing  the  records  within 
the  index.  Appended  to  each  descriptor  field  are  fields  which  contain 
addresses  of  the  associated  records  in  the  information  file.  Some 
type  of  search  plan  is  conducted  (usually  binary)  for  matching  descrip¬ 
tors  (which  are  sort  keys)  to  the  query  term.  When  a  successful 
match  is  achieved,  the  addresses  of  the  appropriate  information  re¬ 
cords  are  obtained  and  the  records  are  directly  retrieved. 

E.  COMBINED  SEARCH  PLANS 

The  above  treatment  of  search  plans  demonstrates  that  the 
techniques  are  dependent  upon  file  organization  but  plans  may  be  com¬ 
bined  in  one  IR  system.  For  example,  a  binary  search  may  be  em¬ 
ployed  in  the  index  file  to  locate  the  disk  and/or  track  which  contains 
the  desired  information  while  a  sequential  search  is  made  of  the  track 
for  the  requested  records. 
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V.  RETRIEVAL  SYSTEMS 


This  section  of  the  paper  contains  a  discussion  of  the  primary 
differences  between  reference,  document,  and  fact  retrieval  in  order 
to  provide  a  frame  of  reference  for  the  development  of  a  fact-retrieval 
system.  Reference  retrieval  is  treated  first  since  it  is  the  least 
complicated  of  the  three  types  of  information  retrieval. 

Queries  used  in  a  reference-retrieval  system  contain  only  the 
topic  for  which  information  is  desired  (e.g.  ,  STEEL).  The  material 
provided  to  the  requestor  is  a  list  of  references  pertaining  to  his  topic. 

Document  retrieval  queries  are  na.rrower  in  scope  since  de¬ 
scriptive  terms  are  used  to  modify  the  topic  (e.g.  ,  STEEL,  chemical 
properties  of).  Docun.cnts  are  provided  to  the  requestor  which  contain 
the  desired  information. 

Fact-retrieval  systems  are  the  most  complicated  and  powerful 
of  all  since  they  are  capable  of  providing  specific  answers  to  specific 
questions. 

A.  REFERENCE  RETRIEVAL 

Reference  retrieval  is  the  first  step  taken  by  one  in  search  of 
specific  information.  As  explained  above,  a  reference -retrieval 
system  provides  a  user  with  a  bibliography  pertaining  to  the  topic  for 
which  specific  information  is  .sought.  The  second  step  in  the  search 

for  information  is  totally  unrelated  to  the  reference-retrieval  system. 
The  user  must  examine  the  documents  listed  in  the  bibligraphy  in 
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order  to  obtain  the  desired  information.  It  is  cle  r  that  in  the  first 
step  the  user's  search  for  in/oirnation  is  narrowed  from  a  search  of 
the  entire  "library"  to  a  "shelf"  in  the  library, 

B.  DOCUMENT  RETRIEVAL 

The  definition  of  document  retrieval  is  not  straight  forward. 

One  point-of-view  holds  document  retrieval  as  the  second  step  of 
reference  retrieval.  In  another  point-of-view,  it  is  a  special  case  of 
fact  retrieval.  What  this  author  regards  as  document  retrieval  may 
be  fact  retrieval  to  another.  The  definition  upheld  by  this  author  is 
the  retrieval  of  unprocessed  text  word-for-word  as  it  is  stored  in  the 
information  file.  An  example  would  be  requesting  a  specific  report 
from  a  technical  library. 

C.  FACT  RETRIEVAL 

Fact  retrieval  ranges  from  the  retrieval  of  processed  text 
stored  in  an  information  file  to  the  retrieval  of  specific  answers  to 
specific  questions.  The  more  powerful  end  of  the  spectrum  is  refer¬ 
red  to  as  "question  answering".  Reference  10  contains  an  excellent 
treatment  of  the  general  characterizations,  limitations,  capabilities, 
and  feasibility  of  the  question-answering  type  of  fact-retrieval  systems. 
Reference  11  contains  a  practical  example  of  a  question-answering 
program. 

Confusion  arises  at  the  low  end  of  the  fact-retrieval  spectrum 
where  it  is  difficult  to  distinguish  the  difference  between  document 
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and  fact  retrieval.  One  point  should  help  clarify  the  difference. 
Document -retrieval  systems  possess  only  rote  memory  which  means 
that  their  capability  is  limited  to  the  display  of  information  word-for- 
word  as  it  is  stored  in  the  data  base.  Fact-retrieval  systems  possess 
the  capability  of  manipulating  data  stored  in  the  data  base  into  a  form 
which  best  satisfies  the  user’s  request. 


VI.  DATA  STRUCTURE  FOR  A  FACT -RETRIEVAL  SYSTEJ  ^ 


This  section  contains  the  description  of  a  data  structuring  tech¬ 
nique  proposed  by  Kildall  [Ref.  3]  for  use  in  a  general-purpose  fact- 
retrieval  system.  Specific  useage  of  the  system  depends  in  part  upon 
the  type  of  information  stored  in  its  files.  However,  the  nature  of  the 
system  is  the  processing  of  data  to  provide  a  user  with  specific  answers 
to  his  queries.  Therefore,  the  system  approaches  "question  answering.  " 
The  data-structuring  technique  employs  the  basic  concept  of  hierarch¬ 
ical  classification  which  divides  a  topic  (also  referred  to  as  a  universe 
of  discourse)  into  its  class  structure  anc.  correlates  the  data  elements 
of  the  information  file  to  a  tree-type  classificatory  structure. 

A  treatment  of  the  retrieval  process  is  also  provided  here  since 
the  query  format  is  directly  related  to  the  data-structuring  technique. 

This  section  is  expressly  devoted  to  a  discussion  of  the  data- 
structuring  concept  while  section  VII  contains  the  description  of  the 
general-purpose  fact-retrieval  system  which  employs  the  proposed 
technique.  The  system  was  designed  for  the  primary  purpose  of  in¬ 
vestigating  the  potential  of  the  data-structure  concept  and  not  for 
production  purposes. 

As  previously  discussed,  fact-retrieval  systems  range  from  the 
manipulation  of  processed  text  to  "question  answering.  "  The  system 
described  herein  maintains  a  position  in  the  middle  of  this  continuum. 

The  term  "general  purpose"  u.  ed  here  does  not  necessarily  mean  that 
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the  system  may  be  utilized  throughout  the  full  range  of  fact  retrieval. 
Rather,  it  means  that  the  system  will  accommodate  files  which  contain 
different  types  of  information. 

A.  DATA  STRUCTURE 

The  structure  employed  for  indexing  data  incorporates  the  con¬ 
cept  of  hierarchical  classification  which  allows  the  user  to  enter  the 
data  base  in  a  number  of  ways  in  order  to  extract  desired  information. 
A  universe  of  discourse  is  structured  in  terms  of  "classes"  and  a 
hierarchy  of  classes  is  established  onto  which  the  associated  data 
elements  are  mapped.  For  example,  assume  that  a  universe  of  dis¬ 
cour  e  consists  of  personnel  records.  The  records  consist  of  names, 
addresses,  and  telephone  numbers  which  are  members  of  the  classes 
"NAME",  "ADDRESS,  "  and  "TELEPHONE  NUMBER.  "  "NAME"  is 
further  divided  into  the  subclasses  "LAST,  "  "FIRST,  "  and  "MIDDLE" 
while  "ADDRESS"  contains  "STREET,  "  "CITY,  "  and  "STATE.  " 

The  data  structure  is  then  represented  by  a  classiflcatory  tree  with 
the  data  elements  related  to  the  classes  contained  in  the  tree.  The 
data  element  "DOE,  "  for  example,  is  identified  as  a  member  of  the 
class  "LAST,  "  and  the  class  "LAST"  is  a  member  of  "PERSONNEL 
RECORD.  "  All  data  elements  of  a  structure  are  identified  in  this 
fashion. 

1.  Class  Structure  Represe.itation 

Class  structures  are  represented  by  parenthesized  expres¬ 
sions  which  are  used  to  define  the  structure  of  the  classificatory  tree. 
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The  technique  of  employing  parentheses  to  define  structures  is  similar 
to  that  technique  employed  in  LI SP  S-expressions  [Ref,  12],  Punctua¬ 
tion  symbols  used  in  the  expressions  are  the  left  parenthesis,  the 
right  parenthesis,  and  the  comma.  The  parentheses  are  used  to  en¬ 
close  those  classes  \.hich  are  directly  related  to  a  superclass  while 
the  comma  is  used  to  separate  the  classes  within  the  parenthesized 
unit.  Units  within  an  expression  are  separated  by  commas  and  the 
entire  expression  itself  is  enclosed  by  parentheses.  As  demonstrated 
in  the  preceding  section,  "PERSONNEL  RECORD"  consists  of  the 
classes:  "NAME,  "  "ADDRESS,  "  and  "TELEPHONE  NUMBER.  "  This 
definition  is  called  the  format  definition  and  is  the  foundation  for  the 
construction  of  the  classificatory  tree.  Fornaat  definitions  are 
represented  by  the  parenthesized  expression  shown  below. 

PERSONNEL  RECORD  (NAME,  ADDRESS,  TELEPHONE  NUMBER) 
"NAME"  and  "ADDRESS"  were  further  divided  into  subclasses 
and  the  expressions  below  show  the  parenthesized  forms  for  "class 
definitions.  " 

NAME  (LAST,  FIRST,  MIDDLE) 

ADDRESS  (STREET.  CITY,  STATE) 

Subclasses  may  also  be  subdivided  and  this  process  is  replicated 
to  fully  define  the  class  structure  of  the  universe  of  discourse.  Figure 
4  graphically  demonstrates  the  class  structuring  process,  the  fully 
parenthesized  expression  for  the  class  structure,  and  the  associated 
classificatory  tree.  Although  the  above  example  does  not  include  a 
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EMPLOYEE  RECORD  (N^,  ADDRESS,  TELEPHONE  NUMBER) 
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Figure  4 

Parenthesized  Class  Expressions  and  Associated  Free 
Structure  for  the  Hierarchical  Classification  of  Data. 
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subdivision  for  the  class  "STREET"  one  is  shown  in  the  tree  structure 


to  demonstrate  a  third  level  of  class  replication. 

2.  Data  Representation 

Once  the  class  structure  is  defined,  the  associated  data  may 
be  mapped  directly  onto  the  structure.  Data  representation  is  identical 
to  the  class  expression  as  shown  below. 


((DOE,  JOHN,  JAMES),  (203  ELM  STREET,  MONTEREY,  CA. ),  384-9363) 


i  I 


((LAST,  FIRST,  MIDDLE),  (STREET,  CITY,  STATE),  (TELEPHONE  NO. ) 

s. - - - •  — _ _ _ ✓ 


T 


NAME, 


1 


i 


ADDRESS, 


TELEPHONE  NO. 


EMPLOYEE  RECORD 


Representation  of  repeated  data  elements  within  the  record  are 

easily  handled  by  properly  parenthesizing  the  record.  For  example, 

two  phone  numbers  for  John  Doe  would  be  represented  by; 

((DOE,  JOHN,  JAMES),  (203  ELM  STREET,  MONTEREY,  CAL. ), 

(384-9363, 384-6214)) 

The  class  membership  of  each  data  element  in  the  record  is 
clearly  defined  by  the  parenthesized  expression. 

3.  System  Utility 

The  utility  of  hierarchical  classification  in  association  with 
parenthesized  expressions  is  realized  by  the  user  in  three  ways: 
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1.  The  indexing  techniques  presented  in  section  II  require  the 
user  to  conform  to  the  language  devised  by  the  system  designer  for 
the  retrieval  of  information.  The  user  does  not  have  the  option  of 
defining  the  indexing  language  that  best  suits  his  particular  needs  but 
must  be  satisfied  with  the  indexing  technique  employed  to  best  satisfy 
the  needs  of  all  users.  In  contrast,  this  system  allows  each  user  or 
user  group  to  define  his  own  indexing  language  by  defining  the  class 
structure  associated  with  the  data  he  is  most  concerned  with.  In  other 
words,  the  system  will  accept  a  mix  of  data  allowing  each  user  or  user 
group  to  have  his  own  retrieval  system  within  a  retrieval  system. 

Each  user  or  user  group  must  define  the  class  structure  of  his  data. 
For  example,  a  business -oriented  system  might  consist  of  a  date  base 
partitioned  into  employee  records,  pay  records,  stock  inventory,  etc. 
Such  a  system  would  simultaneously  serve  the  needs  of  many  users. 

2.  The  user  has  the  capability  of  entering  the  data  structure  in 
several  ways  to  extract  desired  information.  In  the  personnel  record 
example,  the  user  n  ay  retrieve  complete  records  which  satisfy  cer¬ 
tain  search  keys,  or  retrieve  only  the  names  of  personnel,  or  retrieve 
the  phone  number  of  a  particular  person,  and  so  on. 

3.  The  classification  scheme  could  serve  as  an  intermediate 
language  between  the  query  processor  and  the  retrieval  system. 


B.  RETRIEVAL  PROCESS 
1 ,  Query  Format 

Queries  are  presented  to  the  system  utilizing  the  same  for¬ 
mat  as  class  expressions.  The  fully  parenthesized  expression  contains 
search  keys  and  blank  positions  which  specify  the  information  to  be 
supplied  to  the  user.  The  retrieval  processor  will  fill  in  the  blank 
positions  with  all  of  the  information  contained  in  the  data  base  which 
satisfies  the  search  keys.  The  expression  must  conform  identically  to 
the  fully  parenthesized  expression  used  to  represent  the  class  structure. 

((DOE,  JOHN _ ).  { - ),  _ ) 
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In  the  example  above,  the  system  will  identify  the  class  member- 


shijp  of  each  search  key  and  blank  position  through  the  classificatory 


I 

tree  constructed  from  the  class  expression.  A  search  is  then  instituted 
for  all  records  w^  h  contain  an  occurrence  of  "DOE"  as  a  member  of 
the  class  "LAST"  and  "JOHN"  as  a  member  of  the  class  "FIIIST.  " 
Information  is  extracted  from  those  appropriate  records  to  fill  the 
blank  positions  of  the  query.  The  user  may  broaden  or  narrow  the 
amount  of  information  retrieved  by  the  number  and/or  class  of  search 
keys  used  in  the  query.  A  query  containing  only  the  search  key 
"CALIFORNIA"  could  produce  a  greater  amount  of  information  than  a 
query  which  has  o.  y  one  blank  position. 

2.  Boolean  Expressions 

The  ability  to  use  Boolean  expressions  such  as  "and,  "  "or,  " 
"not,  "  etc. ,  is  desirable  in  any  information  retrieval  package.  How¬ 
ever,  the  degree  to  which  Boolean  expressions  may  be  used  is  left  to 
the  perogative  of  the  system  designer  in  satisfying  user  needs.  The 
use  of  Boolean  "and"  is  accepted  by  the  retrieval  processor  in  this 
system  and  is  identified  by  the  amphersand; 


({ _ _ _ _ _ ),  {_>,  MONTEREY  &  MARINA.  CALIFORNIA), _ ) 


In  this  case,  the  names,  street  addresses,  and  phone  numbers 
of  all  personnel  who  live  in  Monterey,  California  and  Marina, 

California  would  be  produced. 

The  use  of  Ew-i-  c..i  "or"  is  not  directly  used  in  this  system  but 
its  effect  is  similar  to  the  use  of  alphabetic  and  numeric  range  requests. 
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3.  Alphabetic  and  Numeric  Ranges 


Alphabetic  '»nd  numeric  range  requests  are  identified  by  the 
colon.  Examples  of  range  requests  are  exhibited  below. 

((A  ;  D, — .  ).  ( — .  MONTEREY,  CALIFORNIA).  ) 

The  retrieval  processor  identifies  an  alphabetic  range  request  for  all 
data  elements  which  are  members  of  the  class  "LAST"  and  which  have 
as  a  rirst  letter  A,  B,  C,  ^  D.  The  records  of  all  personnel  who  live 
in  Monterey,  California  and  whose  last  names  begin  with  A  through  D 
inclusive  would  be  produced. 

As  shown  immediately  above,  the  system  does  not  restrict  the 
use  uf  alphabetic  or  numeric  ranges  to  single  letters  but  any  number  of 
characters  may  be  used  and  any  number  of  range  requests  are  possible 
within  a  single  query. 

The  above  discussion  is  also  true  for  numeric  range  requests. 

For  example,  the  user  desires  complete  records  for  all  those  personnel 
who  have  specific  telephone  exchanges: 

_J,  372:394) 
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VII.  SYSTEM  STRUCTURE 


This  section  discusses  the  internal  design  of  the  gener  .  purpose 
fact-retrieval  system  employing  ♦he  data-structure  technique  previously 
explained.  The  system  was  i  nplemented  on  the  Naval  Postgraduate 
School's  IBM  360  Model  67  Computer  and  is  an  interactive  system  under 
control  of  the  Cambridge  Monitor  System  (CP/CMS)  [Ref.  13]. 

A.  DATA  FILES 

Data  files  are  stored  on  punched  cards  and  consist  of  the  following 
three  types; 

1.  Format  definition  car  These  cards  define  the  class 
structure  for  each  universe  of  discourse  to  be  included  in  the  data 
base.  An  example  of  a  format  definition  card  is: 

EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE,  CHILDREN) 

2.  Class  definition  cards.  These  cards  further  define  the 
structure  of  the  classes  contained  in  the  for  uat  definition.  Examples 
of  class  definition  cards  are: 

NAME  (LAST,  FIRST) 

ADDRESS  (STREET,  CITY.  STATE) 

3.  Data  records.  The  data  records  contain  the  data  elements 
associated  with  the  universe  of  discourse  and  are  fully  parenthesized 
expressions.  An  example  of  a  data  record  is: 

EMPLOYEE  RECORD  ((DOE,  JOHN).  (203  ELM  STREET.  MONTEREY,  CA.  ), 

(48).  (MARY.  SALLY)) 
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Format  definitions,  class  definitions,  and  data  records  may  also 
be  entered  into  the  systerr,  via  on-line  terminal.  For  a  large-scale 
data  base,  the  data  recordi;  could  le  stored  in  unstructured  form  on  a 
back-up  storage  tie /ice  such  as  magnetic  tape.  Structuring  of  records 
would  be  accomplished  under  program  ontrol  according  to  pre-stored 
format  and  class  definitions. 

B.  TREE-TYPE  DATA  STRUCTURES 

A  tree-type  data  structure  is  employed  to  represent  the  hierarch¬ 
ical  classification  of  a  universe  of  discourse.  The  tree -structuring 
process  described  later  in  this  section  emppys  data  cells  to  represent 
nodes  within  a  tre  ad  the  "chaining"  technique  to  order  the  cells  into 
tree  structure  form. 

1 .  Data  Cells 

Data  cells  available  to  the  tree-structuring  processor  consist 
of  three  fields.  The  description  and  function  of  each  field  is  described 
below; 

a.  The  identifier  field,  referred  to  as  "TOP,  "  contains  the 
storage  address  (pointer)  of  the  data  or  class  entity  which  the  data  cell 
represents. 

b.  The  right  link  field,  referred  to  as  "RIGHT,  "  contains  a 
pointer  which  is  used  to  chain  the  data  cell  to  another  data  cell  on  the 
same  level  of  the  tree. 

c.  The  down  link  field,  referred  to  as  "DOWN,  "  contains 
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a  pointer  which  is  used  to  chain  the  data  cell  to  another  data  cell 
located  in  a  lower  level  of  -ne  tree.  Figure  5  demonstrates  the  use 
of  data  cells.  A  zero  in  a  link  field  signifies  "no  link"  or  a  null  field. 

2.  Structuring  Process 

Empty  data  cells  are  constructed  in  core  storage  through 
list  structuring  techniques  and  are  stored  in  an  area  available  to  the 
tree-structuring  routine.  The  rea aing  of  a  format  definition  card 
initiates  the  structuring  process.  The  format  name  (e.g.  ,  EMPLOYEE 
RECORD)  and  the  class  names  contained  on  the  card  are  extracted 
and  moved  into  storage  (a  discussion  of  this  process  is  deferred  to  a 
later  section).  A  number  of  cells  equal  to  tl.e  format  name  plus  the 
number  of  class  names  contained  on  the  card  are  retrieved  and  tree 
structuring  commences.  The  first  cell  in  the  tree  structure  is  called 
a  "header"  and  serves  to  identify  the  format  name  of  the  tree.  Each 
of  the  clas-,es  contained  in  the  format  definition  is  assigned  to  a  data 
cell  and  the  cells  are  chained  together.  Figure  6  shows  the  structure 
renresenting  the  format  definition: 

EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE,  CHILDREN) 

Before  completing  the  discussion  of  tree  structuring  it  is  import¬ 
ant  to  note  that  class  definitions  throughout  the  various  universes  of 
discourse  in  the  data  base  must  be  consistent.  That  is  to  say,  if  the 
class  called  "NAME"  is  defined  as  (LAST,  FIRST)  then  every  occur¬ 
rence  of  "NAME"  must  consist  of  the  classes  "LAST"  and  "FIRST.  " 

If  this  is  not  done,  confusion  arises  during  the  retrieval  process  when 
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NOTE:  The  nvmbers  in  the  TOP  fields  are  sequence 
numbers . 


Figure  5 

Data  Cell  Composition 
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EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE, CHILDREN) 

The  numbers  in  the  TOP  fields  correspond  to; 

1  EMPLOYEE  RECORD 

2  NAME 

3  ADDRESS 

4  AGE 

5  CHILDREN 


Figure  6 

Tree  Structure  Composed  of  Data  Cells 
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i 

the  processor  attempts  to  identify  the  class  memberships  of  data 
elements.  Therefore,  as  each  format  definition  is  read,  a  search  is 

i 

conducted  of  all  previously  constructed  trees  to  determine  whether  or 
not  each  of  the  classes  contained  in  the  definition  being  processed 
have  been  previously  used.  If  a  class  has  been  previously  used  then 
the  tree  structure  representing  the  class  is  appe  ded  to  the  tree  being 
built.  If  a  class  has  not  been  previously  used  then  a  class  definition 
card  must  be  submitted  to  the  tree-structuring  processor. 

After  the  format  definition  card  has  been  processed  any  class 
definition  cards  associated  with  the  structure  are  processed.  Figure  7 
contains  a  completed  tree  structure  for: 

EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE,  CHILDREN) 
NAME  (LAST,  FIRST) 

ADDRESS  (STREET,  CITY.  STATE) 

C.  INDEX  FILES 

The  system  incorporates  an  index  file,  called  the  master  index, 
which  demonstrates  many  of  the  characteristics  and  advantages  of  an 
inverted  index.  The  master  index  contains  format  names>,  class  names, 
and  data  elements.  Each  entry  in  the  index  has  a  pointer  associated 
with  it  which  links  the  entry  to  a  tree  structure,  data  record,  or 
further  information  concerning  the  entry.  The  retrieval  process  is 
always  initiated  at  the  master  index  since  it  is  the  agent  which  directs 
the  search  for  information  in  response  to  a  user’s  query. 
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EMPLOYEE  RECORD  (NAME,  ADDRESS,  AGE,  CHILDREN) 
NAME  (LAST,  FIRST) 

ADDRESS  (STREET,  CITY,  STATE) 

The  numbers  in  the  TOP  fields  correspond  to: 


1  EMPLOYEE  RECORD 

2  NAME 

3  ADDRESS 

4  AGE 

5  CHILDREN 


6  LAST 

7  FIRST 

8  STREET 

9  CITY 
10  STATE 


Figure  7 


Tree  Structure  for  the  Format: 


"EMPLOYEE  RECORD" 


1.  Characteristics  of  the  Master  Index 


Conceptually,  the  master  index  is  a  large  matrix  consisting 
of  fixed-length  records  (matrix  rows),  each  containing  eight  fields 
(matrix  columns),  as  shown  in  Figure  8.  The  first  four  characters  of 
format  names,  class  names,  and  data  elements  are  stored  in  the  first 
four  fields  of  the  index.  Entries  which  contain  more  than  four  char¬ 
acters  are  then  stored  in  a  sequential  storage  area  reserved  for 
variable -length  records.  The  remaining  four  fields  of  each  index 
record  contain  information  concerning  the  type  of  entry  (e.g.  ,  format 
name,  class  name,  or  data  element),  the  sequential  store  address  of 
the  full  character  representation  of  the  entry,  if  any,  pointers  to  infor¬ 
mation-bearing  data  cells,  and  other  information  useful  to  the  retrieval 
processor. 

2.  Constructing  the  Master  Index 

The  first  record  of  the  master  index  is  reserved  as  a  table 
of  all  format  names  contained  in  the  data  base.  The  first  record  con¬ 
tains  the  address  of  the  first  data  cell  (identical  to  the  data  cells  used 
in  tree  structuring)  in  a  chain  of  cells  and  each  cell  contains  the 
address  of  a  format  name  located  in  sequential  storage.  Through  this 
record  a  user  may  quickly  determine  the  partitioning  of  the  data  base. 
Figure  9  demonstrates  the  idea. 

Format  names  are  entered  in  the  index  and  linked  to  their 
definitions  whicli  are  located  in  sequential  storage.  Each  of  the  clas¬ 
ses  contained  in  the  format  definitions  are  also  stored  in  the  index. 
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COLUMN (S) 

1-4  :  First  4  characters  of  the  entry 

5  :  No,  if  the  entry  is  a  format  name 

"C"  if  the  entry  is  a  class  name 
"L"  if  the  entry  is  the  lowest  level  class  in  a 
tree  structure 

"D"  if  the  entry  is  a  data  element 

6  :  Pointer  to  the  full  character  representation  in 

sequential  s^o».e 

7  :  Pointer  to  associated  chain  of  data  cells  if  the 

entry  is  classified  "L”,  otherwise  pointer  to 
sequential  s  ore 

8  :  pointer  to  associated  data  cell  in  the  tree  structure 

if  the  entry  is  a  class  or  format. 

Pointer  to  associated  chain  of  data  cells  if  the  entry 
is  a  data  element. 


Figure  8 

Representation  of  the  Master  Index 
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Figure  9 

Reserved  Record  In  Che  Master  Index  for  Format  Names  with 
Associated  Data  Cells  and  Format  Names  in  Sequential  Store 
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Associated  with  each  class  entry  in  the  index  is  a  string  of  data  cells 
which  contain  two  items  of  in'armation  concerning  the  class: 

a.  The  first  field  contains  the  number  of  the  data  record  which, 
in  turn,  contains  an  occurrence  of  the  class.  (This  information  is 
added  when  the  data  records  are  read  and  is  discussed  later.  ) 

b.  The  second  field  contains  a  number  corresponding  to  the  for¬ 
mat  name  which  contains  this  class  entry. 

A  class  may  be  used  in  any  number  of  different  format  definitions 
but  its  structure  must  be  consistent  in  every  occurrence.  Therefore, 
regardless  of  the  number  of  format  definitions  which  contain  a  given 
class,  there  is  only  one  index  record  for  the  class.  The  data  cells 
appended  to  the  class  entry  provide  the  retrieval  processor  with  data 
such  as  the  format  definitions  in  which  the  class  appears.  Among 
other  things,  information  pertaining  to  the  class  entries  provides  the 
retrieval  processor  with  the  capability  of  quickly  abandoning  a  search 
when  a  user  requests  information  through  a  class  which  is  not  a 
member  of  the  format  being  queried. 

Class  definitions  are  processed  in  a  manner  very  similar  to  for¬ 
mat  definition  processing.  The  class  being  defined  is  entered  in  I’n 
index  and  the  definition  is  stored  as  read  in  the  sequential  store.  The 
system  returns  the  sequential  store  address  and  enters  it  in  the  index 
record.  Appropriate  data  cells  are  appended  to  the  index  and  the 
class  structure  is  added  to  the  classificatory  tree.  Wlicn  the  tree  is 
completed,  those  classes  which  arc  end  nodes  in  the  classificatory 
tree  (e.g. .  LAST,  FIRST,  STREET,  CITY.  STATE.  AGE.  and 
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CHILDREN  in  EMPLOYEE  RECORD)  are  identified  and  their  index 
records  are  flagged.  This  is  done  to  ensure  that  elements  in  the  data 
records  are  mapped  onto  l)ic  tree  structure  according  to  their  proper 
class  membership. 

As  each  data  record  is  read  into  the  system  it  is  assigned  a 
unique  number  and  placed  in  the  sequential  store.  Each  element  within 
the  record  is  examined  to  determine  its  class  membership  and  the 
master  index  is  searched  to  determine  if  the  element  was  previously 
entered  by  another  data  record.  The  possibility  of  a  data  element 
appearing  in  tnore  than  one  record  exists  if  the  data  base  contains 
similar  formats  sucli  as  employee  records  and  pay  records.  In  ad¬ 
dition,  a  data  clement  may  be  a  member  of  more  than  one  class  such 
as  the  occurrence  of  "JOHN"  as  a  member  of  both  classes  "FIRST" 
and  "CHILDREN,  "  It  is  highly  desirable  that  there  be  only  one  entry 
in  the  master  index  for  those  elements  which  occur  more  than  once. 
Unique  entries  in  the  index  guarantees  that  when  an  item  is  located  in 
the  index,  the  search  process  is  complete  and  .successful.  Additionally, 
the  need  for  con.bined  search  plans  is  eliminated.  Specific  record 
and  class  membership  information  for  each  data  clement  entered  in 
the  index  is  resolved  by  appending  data  cells  to  the  master  index  entry. 
The  data  cells  contain  the  record  number(s)  from  which  the  element 
was  extracted  and  its  class  membershipis).  Assuming  that  a  data 
elcmcn.  occurs  several  times  in  the  data  base,  the  master  index  would 
still  contain  only  one  record  for  the  clement.  The  record  contains  all 
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of  the  information  pertinent  to  the  retrieval  process.  The  technique, 
relevant  to  both  class  and  data  entries,  results  in  two  important 
savings : 


1.  A  significant  reduction  of  storage  space  is  realized  (if  an 
element  occurs  several  times)  since  multiple  entries  in  the  master 
index  require  more  storage  space  than  a  single  record  and  its 
associated  data  cells. 

Z.  A  significant  reduction  in  search  time  is  realized  since  multiple 
entries  require  the  retrieval  processor  to  conduct  a  full-file  search 
each  time  it  enters  the  master  index. 

3.  Data  Record  Table 


Cells  appended  to  each  data  element  stored  in  the  master 
index  do  not  contain  the  sequential  store  addresse.'’  of  the  records 
from  which  the  data  elements  were  extracted.  This  information  is 
stored  separately  in  a  table  referred  to  as  a  data  record  table.  The 
data  record  table  augments  the  information  contained  in  the  master 
index  and  is  composed  of  fixed-length  records  as  shown  in  Figure  10. 
Each  table  record  consists  of  three  ftolds  which  contain; 

i  .  The  unique  data  record  number, 

b.  Format  membership  of  the  data  record. 

c.  Sequential  store  address  of  the  data  record. 

The  data  record  table  serves  two  functions; 

a.  The  retrieval  processor  bypasses  the  master  index  and  directly 
enters  the  data  record  table  to  satisfy  requests  for  all  data  records 
which  arc  members  of  a  particular  universe  of  discourse. 

b.  The  table  is  also  utilized  for  queries  other  than  those  which 
request  "all  data  records.  "  The  retrieval  processor  searches  Uie 
master  index  to  determine  the  data  records  which  satisfy  a  user's 
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SEQUENTIAL 

STORE 

((DOE,  JOHN),  (80  WHITNEY,.., 

(  (SMITH,  BILL) ,  (32  CAPITAL. . . . 

^,041305416)  ,  (WRENCH)  .... 
((DOE,  JOHN),  (094-63-3152),... 

((EA  3733,  CONN),  (BUICK, . . . 


FORMAT  NUMBER 
1 
2 

3 

4 


Unique  record  number 

Format  membership  of  the  data  record 

Pointer  to  data  record  in  sequential  store 


Figure  10 

Representation  of  the  Data  Record  Table 


COLUMN 

1  : 
2  : 
3  : 


FORMAT  NAME 
EMPLOYEE  RECORD 
CAR  REGISTRATION 
PAY  RECORD 
STOCK  INVENTORY 
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request.  Then  the  processor  enters  the  data  record  table  and  extracts 
the  sequential  store  addresses  of  the  records.  The  sequential  store 
addij'esscs  are  passed  to  the  "output"  section  of  the  retrieval  processor. 

I 

The  information  contained  in  the  data  record  table  is  tabulated 
separately  from  the  master  index  to  achieve  savings  in  storage  space 
and  response  time.  Stc  age  savings  are  realized  since  the  addresses 
of  data  records  in  the  sequential  store  are  contained  only  in  the  data 
record  table  and  are  not  replicated  in  the  master  index  for  each  class 
and  data  element.  System  response  time  is  reduced  for  queries  that 
request  all  data  records  of  a  particular  universe  of  discourse  since 
the  data  record  table  was  designed  primarily  to  e.xpedite  this  type  of 
request.  The  retrieval  processor  extracts  all  of  the  necessary  data 
record  addresses  in  one  access  of  the  table.  The  amount  of  searching 
within  the  table  is  minimal. 

D.  INFORMATION  FILE 

The  "sequential  store"  is  the  system  information  file,  or  data 
base.  It  contains  the  data  records,  format  definitions,  class  definitions, 
and  the  full  character  representation  of  those  entries  in  the  master 
index  consisting  of  more  than  four  characters.  Figure  11  shows  the 
sequential  store  and  its  relationship  to  the  master  index  and  the  data 
record  table. 

The  information  file  is  resident  in  main  core  storage.  The 
variable -length  records  of  this  file  are  sequentially  ordered.  System 
information  files  are  not  normally  stored  in  main  core  unless  they  are 
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MASTER  INDEX  SEQUENTIAL  STORE 


Figure  11 

Relationship  between  the  Master  Index,  Data  Record  Table,  and 
Sequential  Store 
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relatively  small  (which  is  the  case  here).  However,  it  is  imperative 
that  such  a  file  be  resident  on  a  direct  access  storage  device  in  order 
to  provide  satisfactory  system  response  time. 

E.  RETRIEVAL  PROCESSOR 

The  retrieval  processor  is  divided  into  three  operations.  The 
identification  operation  determines  the  type  of  query  posed  by  the 
user;  the  search  operation  determines  the  data  record  numbers  which 
satisfy  the  user's  request;  the  output  operation  retrieves  the  resultant 
data  records  from  the  sequential  store  and  prints  them  at  the  terminal. 
Additionally,  special  messages  are  output  to  the  user  in  the  form  of 
error  messages  to  warn  him  of  invalid  queries,  and  messages  which 
notify  him  of  unsatisfied  queries. 

1.  Query  Types 

The  IR  system  designer  strives  to  achieve  total  utility  of  the 
system  by  providing  the  user  with  a  powerful  retrieval  language. 

Utility  of  the  data  structure  used  in  this  system  is  realized  by  the 
various  types  of  queries  available  to  the  user  for  extracting  informa¬ 
tion  from  the  data  base.  There  are  four  major  types  of  queries  avail¬ 
able  to  the  user. 

a.  Determining  Data  Base  Partitions. 

As  previously  discussed,  the  data  base  may  be  partitioned 
to  allow  a  mix  of  unrelated  information  by  defining  the  class  structure 
of  each  universe  of  discourse  in  the  data  base.  A  user  who  is 
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unfamiliar  with  the  (’ata  base  partitions  (format  names)  may  easily 
determine  this  information  by  submitting  a  special  type  of  query.  The 
format  of  the  query  is  simple  and  consists  of  the  single  search  key: 
"CLASS.  "  This  is  translated  by  the  retrieval  processor  as:  "Output 
the  names  of  all  formats  contained  in  the  data  base.  "  Search  of  the 
master  index  is  then  centered  at  the  first  record  of  the  index  and  its 
associated  chain  of  data  cells  which  contain  the  sequential  store  ad¬ 
dresses  of  the  format  names.  All  format  names  contained  in  the  data 
base  are  output  to  the  user. 

QUERY:  CLASS 

RESPONSE:  EMPLOYEE  RECORD 

PAY  RECORD 


b.  Determining  Format  and  Class  Definitions. 

In  order  to  extract  data  from  a  specific  universe  of  discourse, 
the  user  must  be  provided  with  its  class  structure.  The  class  structure 
determines  the  format  for  data  record  requests.  Queries  of  format 
and  class  definitions  must  contain,  as  a  search  key,  the  format  name 
or  class  name  to  be  defined.  The  search  processor  enters  the  master 
index  to  locate  the  format  name  or  class  name,  extracts  the  address 
of  its  definition  located  in  the  sequential  store,  and  the  definition  is 
output  directly  at  the  terminal. 
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Q’JERY: 

RESPONSE: 


EMPLOYEE  RECORD 

(N/  ME,  ADDRESS,  AGE,  CHILDREN) 


QUERY: 

RESPONSE: 


NAME 

(LAST,  FIRST) 


QUERY: 

RESPONSE: 


AGE 

NO  DESCENDANTS 


c.  Data  Element  F  rieval. 

One  asset  of  the  data  structure  concept  is  that  it  allows  the 
user  to  extract  single  data  elements  from  the  data  base  which  are 
members  of  a  particular  class  and  format,  or  members  of  a  particu¬ 
lar  class  irrespective  of  the  format  membership.  Since  data  elements 
are  mapped  onto  the  end  nodes  of  their  respective  tree  structures,  the 
user  must  use  the  lowest  level  classes  of  the  structure  as  search  keys. 
Failure  to  do  so  prompts  the  retrieval  processor  to  output  corrective 
information  to  the  user.  The  hyphens  in  the  queries  below  indicate  to 
the  retrieval  processor  that  the  expressions  are  queries  and  not  for¬ 
mat  definitions.  The  processor  could  identify  the  expression  by 
searching  the  master  index  for  an  occurrence  of  "EMPLOYEE  RECORD. 
A  successful  search  would  indicate  that  a  format  definition  already 
existed  in  the  system.  However,  use  of  the  hyphen  is  a  simpler  and 
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faster  method  for  posit’  -ely  identifying  the  type  of  expression  submit¬ 
ted  to  the  system. 

QUERY:  EMPLOYEE  RECORD  (NAME, _ ) 

RESPONSE:  INVALID  QUERY : 

DETERMINE  DESCENDANTS  OF:  NAME 
USE  DESCENDANTS  AS  KEYWORDS 

QUERY:  EMPLOYEE  RECORD  (LAST, —  ) 

RESPONSE:  BROWN 

SMITH 
THOMPSON 

To  answer  the  above  query,  a  search  is  conducted  in  the  master 
index  for  all  data  elements  which  are  members  of  the  class  "LAST” 
and  are  members  of  the  format  "EMPLOYEE  RECORD.  "  This  infor¬ 
mation  ir  contained  in  the  data  cells  appended  to  each  data  entry  in  tho. 
index.  Elements  which  satisfy  the  query  are  taken  directly  from  the 
master  index,  and  output  at  the  terminal. 

In  the  query  below,  the  hyphen  is  used  to  differentiate  between  a 
query  and  a  class  definition  statement.  All  data  elements  which  arc 
members  of  the  class  "LAST"  are  output  irrespective  of  format 
membership.  The  format  membership  fields  of  the  data  cells  are 
ignored  during  th<  search  of  the  master  index. 

QUERY:  LAST  ( - ) 

RESPONSE:  BROWN 

CHAMBERS 

COTTLE 

DOE 

SMITH 

THOMPSON 
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d.  Data  Record  Retrieval. 


j  Data  record  retrieval  is  the  most  valuable  and  would  be  the 
most  frequently  used  type  of  request  available  to  the  system  user. 
Extraction  of  complete  data  records  which  satisfy  the  search  keys 
contained  in  the  query  is  accomplished.  To  retrieve  data  records,  the 
queries  contain  data  elements  as  search  keys  and  may  contain  Boolean 
"AND,  "  alphabetic  and/or  numeric  ranges,  or  any  combination  thereof. 
The  query  format  is  a  fully  parenthesized  expression  as  shown  in 
previous  sections.  Search  keys  are  positioned  in  the  expression  with 
respect  to  class  membership  and  hyphens  inserted  in  those  positions 
for  which  information  is  requested.  Any  variation  from  the  properly 
parenthesized  expression  prompts  error  messages  from  the  retrieval 
processor  to  the  user. 

The  retrieval  process  for  the  query  listed  below  is  explained  in 
the  following  paragraphs; 

EMPLOYEE  RECORD  {(DOE. — ),  (_. — ,  CA.  )(__),  (_J) 

The  mat  name  appearing  at  the  beginning  of  the  query  expres¬ 
sion  informs  the  retrieval  processor  of  the  universe  of  discourse  in 
which  the  user  is  interested.  The  processor  then  traverses  the  tree 
structure  for  "EMPLOYEE  RECORD"  to  determine  the  lowest  level 
classes  in  the  tree.  This  information,  in  conjunction  with  the  proper 
use  of  parentheses  in  the  query  expression,  allows  the  processor  to 
identify  the  class  memberships  of  the  search  keys  contained  in  the 
query.  The  user  is  notified  whenever  the  processor  is  unable  to  find  a 
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search  key  in  the  master  ind^x.  In  this  case,  the  processor  attempts 
to  recover  data  which  satisfies  the  remaining  search  keys.  Similar 
action  takes  place  when  the  processor  encounters  a  search  key  which 
is  not  a  member  of  the  class  specified  in  the  query,  or  if  a  search  key 
is  not  cl  member  of  the  format  specified  in  the  query.  Additionally, 
the  user  is  uotifieij  whenever  the  query  is  improperly  formatted. 

Each  search  key  in  the  query  is  processed  sequentially.  The 
retrieval  processor  searches  the  master  index  for  an  occurrence  of 
each  key.  Record  numbers  which  contain  an  occurrence  of  the  search 
key  are  extracted  and  stored  in  a  list.  After  all  search  keys  have  been 
processed,  the  retrieval  processor  "ANDS"  the  record  numbers  in  the 
list  to  determine  which  records  satisfy  the  query.  For  example, 
assunxing  tnat  two  key  words  are  used  and  record  numbers  5,  "^2,  and 
67  satisfy  the  first  key  word,  and  record  numbers  32  and  67  satisfy 
the  second  key  word,  records  32  and  67  ^rc  output  to  the  user.  Re¬ 
cord  numbers  which  satisfy  the  query  arc  passed  ti^  the  "output"  sec¬ 
tion  of  the  retrieval  processor  which  retrieves  the  sequential  store 
addrossi  s  of  the  records  from  the  data  record  table  and  prii.l  the 
records  at  the  ternxinal. 

A  user  has  the  ability  to  immediately  examine  the  results  of  his 
query  since  the  system  is  interactive.  The  results  of  one  query  may 
prompt  the  user  to  submit  another  request,  either  broadening  or 

narrowing  th  -  request  tlirough  judicious  use  of  search  keys.  In  any 
case,  the  user  is  gu.arantecd  that  if  the  information  that  he  seeks  is 
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contained  in  the  dcta  base,  he  will  have  quick  and  easy  access  to  it. 
Appendix  A  contains  a  sample  run  of  the  fact-retrioval  system  and 
demonstrates  all  of  the  querios  available  to  a  user  and  the  system 
responses. 

F.  ALTERING  THE  DATA  BASE 

1,  Changes  and  Deletions 

Due  to  the  experimental  nature  of  the  system,  no  utility 
routines  have  been  provided  for  deleting  records  or  making  changes 
to  existing  records.  Alterations  are  accomplished  by  manually 
changing  the  card  images  in  the  data  files. 

2.  Additions 

The  addition  of  data  records  to  existing  data  sets  or  the  sub¬ 
mission  of  new  universes  of  dircuurse  are  accomplished  most  easily 
without  special  utility  routines.  This  feature  is  inherently  built  into 
the  system  through  the  data  structuring  technique.  Addition  of  a  new 
universe  of  discourse  is  accomplished  by  submitting  format  ard  class 
definitions,  and  associated  data  records  cither  on-line  through  the 
terminal  (automatically)  or  off-line  with  card  images  (manually).  N?\v 
data  records  may  also  be  added  to  existing  data  files  automatically  or 
manually. 
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Vlll.  CONCLUSIONS 


Cbaractrristics  of  Iho  data-structuring  concept  as  used  in  a 
genoral-purposo  fact -retrieval  system  have  been  discussed  throughout 
the  proceeding  sections.  These  concepts  are  sunimarii;;ed  here. 

The  data  structuring  technique  encompasses  the  concept  of 
hierarchical  classification  which  is  the  most  widely  used  method  of 
indexing.  Hierarchical  classification  of  data  is  a  relatively  simple 
technique  to  use  but  possesses  the  power  to  divide  and  subdivide  a 
universe  of  discourse  into  more  specific  subjects.  Additionally, 
hierarchical  structures  may  be  created  to  include  a  domain  of  subjects. 
This  is  advantageous  for  use  in  a  fact-retrieval  system,  as  previously 
demo^istrated,  by  providing  a  mix  of  structures  in  a  single  data  base. 
Therefore,  users  with  differing  interests  are  provided  simultaneous 
access  to  a  single  system  since  each  is  provided  a  "personal”  retrieval 
system  within  a  larger  retrieval  system.  In  addition,  the  hierarchical 
structure  provides  a  user  with  multiple  avenues  of  access  into  his 
information  file. 

Parenthesized  expressions  serve  as  an  intermediate  language 
between  the  query  processor  and  the  information  retrieval  system. 

The  query  processor  is  able  to  determine  the  class  memberships  of 
elemen.  within  an  expression  by  examination  of  the  parenthesized 
form.  It  is  apparent,  however,  that  the  use  of  parenthesized  expres¬ 
sions  is  cumbersome  and  demandir.t^  since  misplacing  parentheses  is 
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easy  to  do  and  eaMscs  loss  of  meaning  of  the  expression.  On  the  other 
hand,  it  can  be  argued  that  the  technique  of  parenthesizing  expressions 
is  powerful  and  an  cqu?1ly  powerful  substitute  is  difficult  to  theorize. 


APPENDIX  A 
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COMPUTER  PROGRAM 

IMPLICIT  INTEGER*2  (A-ZI 
DIMENSION  FTABOOl 
DIMENSION  UPRANOO) 

DIMENSION  RECTAB(IOO) 

DIMENSION  TOPI  603) ♦ R  IGHT 1 602 ) , DOWN  1 601 ) 

DIMENSION  WnRK(24l),ACUM(241) ,FRST4(241) 

DIMENSION  LEVEL!  50), ORTABI  50,3)  f  DESTAB (30)  ,N0DTAB(  !^0) 
DIMENSION  RN0(603) ,CMEM(602),NEXT(601) 

DIMENSION  SERCH ( 30 ), MI NDEX( 500,8 ),SEQ{ 4000) 
COMMON/ONE/TOP, AVAIL 1,P,Q,RR,DD,YY 
COMMON/TWO/RNO, AVAIL2,S,R,CM,RM 
COMMON/THREE/ANS, SERCH, MI NDEX,ACUM,SJ,SK,N,SEQ 
COMMOM/FOUR/NODTAB,FD 
COMMON/FIVE/MULT,MK,EK 
COMMON/S  I X/S2, AT, AE,HCTR,UPRAN 
EQUIVALENCE  ( TOP ( 3 ) , RI GHTI 2 ) ,DOWN( 1 ) ) 

EQUIVALENCE  ( RNO ( 3 ) , CMEMI 2 ) ,NEXT ( 1 ) ) 

DATA  OP/' ( '/,CP/' ) •/,BLANK/'  */, COMMA/ ST AR/ '*•/ , 
lEQS/'='/,CX/'C' /,0X/'0'/,D0LS/'$'/ 

DATA  LX/'L •/,AX/'A‘/,SX/»S' /,C0L/' : •/, 
lAM'»/'£'/,HYP/'-'/ 

- INITIALIZE  ARRAYS, CERTAIN  COUNTERS  AND  SUBSCRIPTS. 

AVAIL1=1 

Q=AVAIL1 

AVAIL2=1 

S=AVAIL2 

R=S 

CALL  INITl 

CALL  INIT2 

TERM=0 

ER=0 

FT=0 

MI=1 

FNUM=0 

RNUM=0 

DO  20  1=1,50 
DO  21  J=l,3 
ORTABI I ,J)=BLANK 
21  CONTINUE 
20  CONTINUE 

DO  23  1=1,500 
MINOEXI I,7)=8LANK 
DO  24  J=l,4 
MINDEXt I,J)-BLANK 
24  CONTINUE 
23  CONTINUE 
DO  26  1=1,30 
FTABI I )=BLANK 
26  DESTABI I )=BLANK 

- RESERVE  THE  FIRST  ROW  OF  THE  MASTER  INDEX  FOR'CLASS*. 

- MINDEX(1,8)  POINTS  TO  CELLS  WHICH  CONTAIN  FORMAT 

- NUMBERS, AND  POINTERS  TO  THE  FULL  CHARACTER  REPRESENT A- 

- TION  OF  THE  FORMAT  NAME  IN  SEQUENTIAL  STORAGE.  CELLS 

- ARE  ATTACHED  AS  THE  RECORD  FORMATS  ARE  PROCESSED. 

M!NDEX(1,1)=CX 
MIN0EXn,2)=LX 
MINUEXI 1,3)=AX 
MINOEXI 1,4)=SX 
MIN0EXI1,5)=0 
MINDEXI l,6)=l 
MINOEXI 1,8)=0 
SEQI 1)=CX 
SEQ!2)=LX 
SEQI3)=AX 
SE0I4)-SX 
SEQI5)  .X 
SEQI6)=STAR 
SI=6 

44  DO  28  1=1,30 
28  3ERCHI I )=BLANK 

45  J=1 
K=80 
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PCTR=0 

DO  27  1=1,241 
WORKd  )  =  BLANK 
27  ACUM{ I )=BLANK 

IF(TERM.EQ.1.0R.ER.EQ.1»  GO  TO  399 

- READ  ONE  RECORD  TNTO  THE  ‘WORK*  ARRAY  DETERMINE 

- IF  THE  PARENTHESES  ARE  BALANCED  OR  IF  THE  RECORD 

- EXCEEDS  240  CHARACTERS. 

46  READ  (4,2,END=900)  { WORK ( I ) , I = J , K) 

2  FORMAT  (BOAl) 

GO  TO  41 

399  READ  (5,2)  ( WORK ( I ) , I = J, K) 

IF(W0RK(  1)  .EQ.OOLS.OR.WORKI  2)  .EO.DOLS.OR.WORKO) 
l.EQ.DOLS)  GO  TO  1000 
41  DO  30  L=J,K 

IF(W0RK(L) .EQ.OP)  PCTR=PCTR+1 
IF(WORK(L).EQ.CP)  PCTR=PCTR-1 
IF(W0RK(L+1) .EQ.STAR)  GO  TO  32 
30  CONTINUE 
J=J+80 

IF(K.GT.240)  GO  TO  950 

li-l  ICKM.EQ.I.OR.ER.EO.I)  GO  TO  399 

GO  TO  46 

32  IF(PCTR)  925,40,925 

- DEBLANK  THE  RECORD  CONTAINED  IN  WORK,  LOAD  IT  INTO  THE 

- ACCUMULATOR,  AND  DETERMINE  ITS  LENGTH. 

40  ER=0 
J=0 

DO  47  1=1,241 

IF(WORK(I|.EQ. BLANK)  GO  TO  47 
J=J+1 

ACUM(J)=WORK(n 

IF(W0RK(I ). EQ.STAR)  GO  TO  48 

47  CONTINUE 

48  N=J-1 

——DETERMINE  THE  RECORD  TYPE  AND  BRANCH  TO  THE 
- APPROPRIATE  BLOCK  OF  CODE  FOR  PROCESSING. 


CALc  I0ENT(£600,£800,C700) 


- THIS  BLOCK  OF  CODE  PROCESSES  INPUT  RECORDS, 

- 1. E., FORMAT  OEFINITIONS.CLASS  DEFINITIONS, AND  DATA 

- RECORDS.  FORMAT  OR  CLASS  TREES  ARE  STRUCTURE^ , ENTRI ES 

- MADE  IN  THE  MASTER  INDEX, DATA  RECORD  TABLE, AND 

- SEQUENTIAL  STORAGE. 

IF(ACUM(l).EQ.EQS)  GO  TO  251 

00  50  1=1, N 

MK>I 

IF(ACUM(n. EQ.OP)  GO  TO  55 
50  SERCH(I)»ACUM(I) 

55  IF  (ACUM(MK+1).NE.CP)  GO  TO  56 
WRITE(6,280) 

280  FORMAT!  IH  ,MNVAL10  QUERY:  MISSING  HYPHEN') 

GO  TO  44 

56  NN-N 
N»MK-l 
CALL  MISRCH 
IF(ANS.EO.O)  GO  TO  57 
FN0«MIN0EX{SJ,5) 

GO  TO  200 

C - THIS  RECORD  IS  A  FORMAT. 

C - ADO  A  CELL  TO  HIN0EX(1,8)  FOR  THIS  FORMAT. 

57  FNUM»FNUM>1 
IF(R.NE.S)  NEXT(R)=0 
CALL  GET2 
NEXT(R)«0 

IF(MIN0EX(1,8).N£.0)  GO  TO  71 

MINDEXI 1,8)*R 

CM=R 
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GO  TO  73 

71  CM=MINOEX( 1,81 

72  IF(NEXT(f.M)  .EQ.O)  GO  TO  78 
CM=NEXT(CM) 

GO  TO  72 

78  NEXT(CM)=R 
CM=NEXT(CM) 

73  RNO(CM)=FNUM 
SI=SI+1 
CMEM(CM)=SI 
MI  =  l 

74  MI=MI+l 

IF(MINDEX(MI,l).NE. BLANK)  GO  TO  74 

LK=MK-1 

00  77  1  =  1, LK 

IF(I.GT.4l  GO  TO  60 

MINOEXIMI  ,n=ACUM(  I  ) 

60  SEQ( SI )=ACUM( I ) 

SI=SI+1 
77  CONTINUE 

79  S6Q( SI  )  =  STAR 
SI=SI+l 

C - SET  A  POINTER  TO  THE  FORMAT  DEFINITION  STORED  IN  ’SEQ* 

MIN0EX(MI,7)=SI 
DO  82  L=MK,NN 
SEQ(SI )=ACUM(L) 

SI  =  SU1 
82  CONTINUE 

SEQ( SI )=STAR 

C - SET  A  POINTER  TO  THE  FORMAT  NAME  STORED  IN* SEQ*. 

MIN0EX(MI,6)=CMEM(CM) 

MIN0tX(MI,5)=FNUM 

C - INITIALIZE  A  HEADER  CELL  FOR  THE  FORMAT  TREE. 

CALL  GETl 
AVAIL1=P 
MIN0EX(MI,8)=P 
TOP(P)=MI 

C - CHAIN  A  CELL  TO  THE  HEADER  FOR  THE  FIRST  CLASS  OF  THIS 

C - FORMAT  DEFINITION. 

CALL  GETl 
RR=P 
00=P 

00WN(P)=0 

- DETERMINE  IF  EACH  CLASS  HAS  BEEN  PREVIOUSLY  0EFIN''0 

- EACH  CLASS  IS  UNIQUELY  DEF I  NED, THEREFORE .THERE  WILL  BE 

- NO  DUPLICATE  CLASS  ENTRIES  IN  THE  MASTER  INDEX. 

- IF  A  CLASS  HAS  BEEN  PREVIOUSLY  DEFINED,  ITS 

- DESCENDANTS  ARE  LOCATED  AND  ADDED  TO  THE  TREE. 

01=0 
0J  =  0 

84  00  80  IK=1,30 
SERCHI  IK)  =  BLANK 

80  FRST4( IK)=BLANK 

81  MK=MK+1 

IF(ACUM(MK).EO.STARI  GO  TO  170 

IF(ACUM(MK).EO.COMMA.OR.ACUM(MK).EQ.OP)  GO  TO  81 
I  =  MK 
J=l 

90  SERCHI J)=ACUM( n 
IF(J.GT,4)  GO  TO  95 
FRST4( J)=ACUM( I ) 

95  1=1+1 

IFIACUMin.EQ.COMMA.OR.ACUMin.EQ.CP)  GO  TO  100 
J= J+  1 
GO  TO  90 
100  N=J 

CALL  MISRCH 
IFtANS.EQ.n  GO  TO  150 
MI  =  l 

105  MI=MI+1 

IFIMINDEXIMI,  D.NE. BLANK)  GO  TO  105 
00  no  JK  =  1,4 
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110  MINDEX(MI,JK)=FRST4( JK) 

IF(J.GT,4)  GO  TO  120 
MINDEX(MI ,61=0 
GO  TO  130 
120  SI=SI+1 
i  MINDEX(MI ,6)=S! 

DO  125  K=1,J 
SEQ( SI )=SERCH(K1 
125  SI=SI+1 

SEQ( SI )=STAR 

130  MINDEXIMI ,5)=CX 
IFITOPIRRl .EO.Ol  GO  TO  131 
CALL  GETl 

RIGHT{RR)=P 

RR=RIGHT(RR) 

131  TOP(RR)=MI 
MINDEX(MI,8)=RR 
D0WN(RR)=0 

- “PROCESS  THE  NEXT  CLASS  IN  THE  FORMAT  DEFINITION. 

- IF  NONE  THEN  READ  IN  THE  NEXT  RECORD. 

MK=I 

CO  TO  85 

150  IF(MIN0EX(SJ,5).EQ.LX}  60  TO  151 


DI=D1+1 
OESTABIDI )=SJ 
MI  =  SJ 

151  IFITOPIRRl. EQ.O)  GO  TO  155 
CALL  GETl 
RIGHTIRR)=P 
RR=RIGHTIRR) 

155  TOPIRR)=SJ 

MINDEXISJ,8)=RR 

00WNIRR1=0 

MK=I 

GO  TO  85 

- ADD  THE  PREVIOUSLY  DEFINED  CLASSES  TO  THE  TREE. 

170  MK=0 

IFIOESTABIll.NE. BLANK)  GO  TO  175 

171  DO  173  1=1,30 
OESTAoII )=6LANK 

173  CONTINUE 
GO  TO  45 
175  0J»0J+1 

IFIOESTABIDJl.EQ. BLANK!  GO  TO  171 
180  L=MIN0EXI0ESTABI0JI,7) 

00  193  J*l,241 
193  ACUM(J)=BLANK 
J*1 

197  ACUMIJ)«SEQILI 
L=L  +  1 
J*J+1 

IFISEQILI.NE.STAR)  GO  TO  197 
ACUMIJ)=SEQIL) 

RR«HINOEXIOESTAeiOJ) ,81 
CALL  GETl 
OOWNIRRl-P 
RR»OOWNIRR! 

60  TO  85 

—  THIS  BLOCK  OF  CODE  PROCESSES  CLASS  DEFINITIONS 
200  IFIHIN0EXISJ,5).NE.CX.AN0.MINDEXISJ,5).NE.LX)  GO  TO  25 
SI-SI+l 

MIN0EXISJ,7)=SI 
00  215  L*MK,NN 
SEOISI  )«ACUHIH 
215  SI=SI+I 

SEOISI  1-STAR 
RR«M!NOEXtSJ,e) 

CALL  GETl 

DOWNIRRI-P 

RR-OOWNIRR! 

OOUNIRRl-0 
GO  TO  85 
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- -THE  ACCUMULATOh  CONTAINS  A  DATA  RECORD. TRAVERSE  THE 

- TREE  AND  LOCATE  ALL  END  NODES  OF  ALL  BRANCHES. DATA 

- ELEMENTS  ARE  MAPPED  ONTO  THEIR  RESPECTIVE  CLASSES. 

——IF  THE  FIRST  CHARACTER  OF  THE  RECORD  IS  AN  ASTERISK 

- THEN  THE  RECORD  IS  A  DATA  RECORD  AND  A  MEMBER  OF  THE 

- SAME  FORMAT  AS  THE  LAST  DATA  RECORD  PROCESSED. 

?.bO  CALL  TRAV 
251  NI=:1 
NN=J-1 
RNUM=RNUM+1 

- STORE  THE  RECORD  IN  SEQUENTIAL  STORAGE  AND  MAKE  DATA 

- RECORD  TABLE  ENTRIES. 

SI=Si+l 

ORTABtRNUH»U=RNUM 
DRTAB(RNlJM,2)=FNO 
DRTAB(RNUM,3)=SI 
IFIACUMI  U.EQ.EQSI  MK=2 
DO  260  I=HK,NN 
SEOISI  )- ACUMin 
260  SI  =  SI-M 

SEQISI I=STAR 

- PROCESS  EACH  DATA  ELEMENT  IN  THE  RECORD. 

SET=C 

CALL  CH»-94(ACyM»FRST4«NN) 

- DETERMINE  IF  t  MULTIPLE  ENTRY  EXISTSi 

- E.G.,((JOHN  DOEI, (ROBERT  SMITH)) 

MKsMK+l 

255  IF(ACUM{MKi.EQ.nP)  CALL  MENT 
IF(MULT.EQ.U  MM=NI 
265  DO  267  Ul,30 
267  SERCHI I )=BLANK 
J=l 

270  SERCH{.)=AC’.M(MK) 

J=J+1 

|y|Ks|y|K4'  1 

271  IF(ACUM(MK).£O.STAR)  GO  TO  44 

IF(ACUM(MK). EW. COMMA. OR.ACUMIMK) .EQ. CP)  GO  TO  275 
GO  TO  270 

- DETERMINE  IF  THE  DATA  ELEMENT  HAS  BEEN  PREVIOUSLY 

- ENTERED  IN  THE  MASTER  INDEX.  IF  NOT  MAKE  ENTRIES  IN 

- THE  MASTER  INDEX, SEQUENTIAL  STORE, AND  INITIALIZE  THE  C 

- MEMBERSHIP  CELLS. 

275  N=J-l 

CALL  MISRCH 
IF(ANS.EQ.O)  GO  TO  535 
SET=SET+4 
DO  455  JJ=1,30 
455  SERCH( JJ)=BLANK 
CM=MINDEX(SJ,8) 

460  IF(NEXT(CM).EQ.O)  GO  TO  465 
CM=NEXT(CM) 

GO  TO  460 
465  NEXT(R)=- 
CALL  GET2 
NEXT{CM)=R 
RNO(R)=RNUM 
CMEM(R)=NQOTAB(NI ) 

SJ=NODTAB(NI  ) 

NI=NI+l 

IF(NODTAB(NI ) .NE. BLANK)  GO  TO  480 
GO  TO  45 

480  IF(MIN0F  ( SJ,7) .EQ.BLANK)  GO  TO  493 
CM=MINDE,\(  SJ,7) 

485  IF(NEXT(CM).EQ.O)  GO  TO  490 
CM=NEXT(CM) 

GO  TO  485 
490  NEXT(R)=0 
CA{  L  GET2 
NL. T(CM)=R 
GO  TO  495 
493  NEXT(R)=0 
CALL  GETZ 
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MIN0EX{5J»7»=R 
495  RNO<R»=RNUM 
CMEM{R»=FNO 

492  IF(ACUM(MK).NE. COMMA)  GO  TO  498 

497  gQ  jQ  255 

GO  TO  265 

498  MK=HK+1 

!F(ACUM(MK).EQ.STAR)  GO  TO  44 

IF(ACUM<MK}. EQ. COMMA. OR.ACUMCMX) .EQ. CP)  GO  TO  498 
IF<ACUM(MK).EQ.OP)  GO  TO  500 
GO  TO  265 

500  IF(MULT.NE.l)  GO  TO  255 
MULT=0 
NI=MM 

505  MK=MK+1 

IF(ACUM(MK).EQ.STARI  GO  TO  44 
IF(ACUM(MK).EQ.OP)  GO  TO  505 
60  TO  265 
535  DO  540  M=l,4 
SET=SET+1 

540  MIN0EX(SJ,M)=FRST4(SET) 

IF(J.LF.5)  GO  TO  555 
SI=SI+1 

MIN!0EX(SJ,6)  =  SI 
LL=1 

545  SEQ(SI )=SERCH(LL) 

LL=LL*H 

IF(LL.EQ.J)  GO  TO  550 
SI=SI+1 
60  TO  545 
550  SI=SI+1 

SEQ(SI)=STAR 

555  IF(J.LE.5)  MINOEX( SJt6)=0 
MINDEX(SJ,5)=0X 
NEXT(R)=0 
CALL  GET2 
MINDEX(SJ,8)=R 
RNO(R)«RNUM 
CHEH(R)*NOOTAB(NI  ) 

L»NOOTAB{NI) 

N!=NI-H 

IF{MIN0E<(Lt7).EQ. BLANK)  GO  TO  580 
CM=MIN0tX(L,7) 

565  IF(NEXT{CM).EQ.O)  GO  TO  570 
CM=NEXT(CM) 

GO  TO  565 
570  NEXT(R)=0 
CALL  GET2 
NEXT(CM)=R 
RNO(R)=RNUM 
CMEM(R)«FNO 
GO  TO  492 
580  NEXT(R)=0 
CALL  GET2 
MINDEX(Lf7)=R 
RNO{R)=RNUM 
CMEM(RJ=FNO 
GO  TO  492 

C - THIS  BLOCK  OF  CODE  PROCESSES  FORMAT  DEFINITION, CLASS 

C - DEFINITION, AND  FORMAT  NAME  QUERIES. 

600  DO  605  1=^1, N 
605  SERCH{I)=ACUM(I  ) 

CALL  MISRCH 
IF(ANS.EO.l)  GO  TO  615 
607  WRITE(6,610)  ( SERCHII ) ,1=1,30) 

610  FORMATdH  , 'REQUEST  NOT  FULFILLED:', 

1  /,T2,30A1, 'WAS  NOT  FOUND') 

GO  TO  44 

615  IFISJ.NE. 1)  GO  TO  645 

C - OUTPUT  ALL  FORMAT  NAMES. 

CM>MIN0EX(1,8) 
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618  ST=CMEM(CM) 

SE=ST 

620  IF(SEQ(SE+1) .EQ.STAR)  GO  TO  625 
SE=SE+1 
GO  TO  620 

625  WRITE  (6,630)  RNO( CM) , ( SEQ( I ) , I *ST , SE) 

630  FORMATdH  .'FORMAT  NUMBER' ,  13, 2X,30Al) 

IF(NEXT(CM).EQ.O)  GO  TO  635 
CM=NEXT(CM) 

GO  TO  618 
635  WRITE  (6,640) 

640  FORMAT  ( /T2, 'REQUEST  COMPLETE') 

FT=0 

GO  TO  44 

645  IF(MINDEX(SJ,5) .NE.DX)  GO  T  650 

646  WRITE(6,647)  ( SERCH( I ) , I =1 , . J) 

647  FORMATdH  ,' INVALID  QUERY:', 

1/,T2,30A1, 'IS  A  DATA  ELEMENT') 

GO  TO  44 

650  IF(MIN0EX(SJ,5) .NE.CX.AND. M1NDEX(SJ,5) .NE.LX) 

1  GO  TO  670 

IF(MIN0EX(SJ,5).NE.LX)  GO  TO  652 
WRITE(6,653)  ( SERCH( I ) , 1=1 , 30) 

653  FORMATIIH  ,30A1,'HAS  NO  DESCENDANTS') 

GO  TO  44 

652  ST=MINOEX( SJ,7) 

SE=ST 

655  1F(SEQ(SE+1). EQ.STAR)  GO  TO  660 
SE=SE+1 
GO  TO  655 

660  WRITE(6,665)  ( ACUM{ I ) , 1  =  1, N) , ( SEQ( I  X) , IX=ST,SE) 

665  FORMATdH  ,30A1 , 3(  80A1 ,/) ) 

GO  TO  44 

670  DO  673  1=1, N 
673  SERCH(I)=ACUM(I) 

CALL  MISRCH 
IF(ANS.NE.l)  GO  TO  607 
60  TU  652 

- .-IF  7HP  KEYWORD  SPECIFIED  IN  THE  QUERY  IS  A  FORMAT  NAME 

- THEN  fUTPUT  ALL  DATA  RECORDS  WHICH  ARE  MEMBERS  DF  THAT 

- FORMAT.  IF  THE  KEYWORD  IS  A  CLASS  THEN  OU i PUT  ALL  DATA 

- ELEMENTS  WHICH  ARE  MEMBERS  OF  THAT  CLASS. 

700  DO  705  1=1,30 

IF(ACUM(I),EQ.OP)  GO  TO  707 
N=I 

SERCH( I)=ACUM( I) 

705  CONTINUE 
707  CALL  MISRCH 

IF(ANS.EQ.l)  GO  TO  709 
WRITE(6,610)  (SERCH(Il,I=l,30) 

GO  TO  44 

709  IF(MINDEX(SJ,5) .EQ.DX)  60  TO  646 

710  IF(MINDEX(S.;,5)  .E0.CX,0R.MINDEX(SJ,5).EQ,LX)  GO  TO  750 
720  J=0 

725  J=J+l 

IF(0RTAB( J,2).EQ.BLANK)  GO  TO  635 
IF(DRTAB( J,2) .NE.MINDEX(SJ,5) )  GO  TO  725 
ST=DRT^B(J,3) 

SE=ST 

730  IF(SEQ(SE+1). EQ.STAR)  60  TO  733 
SE=SE+1 
GO  TO  730 

733  WRITE  (6,735)  (  SEQd  )  ,  I  =  ST,  SE) 

735  FORMAT  ( 2( /, T2 , 120AI ) ) 

GO  TO  725 

750  IF(MINDEX(SJ,5)-E0.LX)  GO  TO  760 
WRITF{6,753)  ( SERCH( I  ) , 1  =  1 , 30) 

753  FORMA ;(IH  ,' INVALID  QUERY:', 

1/,T2, • DETERMINE  DESCENDANTS  OF:  ',30A1, 

2/tT2,'USE  DESCENDANTS  AS  KEYWORDS') 

GO  TO  44 
760  MI=1 
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765  MI=MI+1 

IF(MINDEX(MI, 11 ,EQ. BLANK)  GO  TO  635 
IF(MINDEX(MI,5) .NE.OX)  GO  TO  765 
CM=MINDEX«MI ,8) 

770  IF(CMEM(CM).EQ,SJ)  GO  TO  771 

773  IF(NEXT(CMl,EQ.O)  GO  TO  765 
CM=NEXT(CM) 

GO  TO  770 

- IF  FT=l  THEN  OUTPUT  ONLY  THOSE  DATA  ELEMENTS  WHICH 

- ARE  'EMBERS  OF  THE  CLASS  AND  FORMAT  SPECIFIED 

— - IN  THE  QUERY. 

771  IFIFT.EQ.OI  GO  TO  774 
IF(DRTAB(RNO(CM) ,2 J.NE.FNO)  60  TO  773 

774  IF(MINOEX(MI,6).EQ.O)  GO  TO  785 
ST=MIN0EX(MI,6) 

SE=ST 

778  IF(SEQ{SE+1) .EQ.STAR)  GO  TO  780 
SE=SE+1 
GO  TO  778 

780  WRITE  (6,783)  ( SEQ( I ) , I=ST, SE ) 

783  FORMAT  (/,T2,30A1) 

GO  TO  765 

785  WRITE  (6,788)  ( M INDEX( MI , I ) , 1= 1 , 4) 

788  FORMAT  (/,T2,4A1) 

GO  TO  765 

C - THIS  BLOCK  OF  CODE  PROCESSES  HYPHEN  AND 

C - B0QLrAN‘AN0‘  REQUESTS. 

800  DO  eOl  1=1,100 
RECTAB(n  =  6LANK 

801  CONTINUE 
RI=0 
RSMK=1 

DO  805  1=1,30 

IF(ACUM(I).EQ.OP)  GO  TO  810 
N=I 

SERCH(I)=ACUM(I) 

805  CONTINUE 
810  CALL  MISRCH 

IF(ANS.EQ.O)  GO  TO  607 

IF(MIN0EX(Sj,5).NE.CX.AN0.MIN0EX(SJ,5) .NE.LX) 

1  GO  TO  820  .  .  , 

WRI^E(6,815)  (SERCH(I),I=1,30) 

815  FORMATdH  , 'INVALID  QUERY:', 

1/,T2,30A1, 'IS  NOT  A  FORMAT  NAME') 

GO  TO  44 

IF(MINDEX(SJ,5) .EQ.DX)  GO  TO  646 
FN0=MINDEX(SJ,5) 

CALL  TRAV 
DO  821  1=1,50 

IF(NODTAB(I).EQ. BLANK)  60  TO  823 
N00E=I 
CONTINUE 
CLAS=0 

DO  824  1=1,241 

IF(ACUM( I ).EQ.COMMA)  CLAS=CLAS+1 
IF(ACUM( I ). EQ.STAR)  GO  TO  826 
CONTINUE 

IF(CLAS.LT.NODE)  GO  TO  827 
WRITE(6,828) 

FORMATIIH  .'INVALID  QUERY:  NUMBER  OF  KEN  .ORD  • 
1/,T2» 'POSITIONS  EXCEEDS  THE  NUMBER  OF  CLASSES 
2/iT2, 'CONTAINED  IN  THE  SPECIFIED  FORMAT') 

GO  TO  44 

8?7  CALL  QSCAN(£822, 8855, 8870) 

C - CMNO  IS  CLASS  MEMBERSHIP  NO. 

822  CMNO=NOOTAB(HCTR) 

DI=0 

CALL  MISRCH 
IF(ANS.EQ.l)  GO  TO  831 
WRITE(6,825)  ( SERCH( I ) , I=l , 30) 

825  FORMATIlH  , 30A1 , 2X, ' WAS  NOT  FOUND:' 


820 


821 

823 


824 

826 

828 


\ 


ruNn«iiin  ,  ,  dA,  •  nu  i  ruuinu.  •  , 

I/, T2, 'RECORDS  SATISFYING  OTHER  KEYWORDS, IF  ANY 


83 


2  'ARE  LISTED* ) 

AT^^AE 

CALL  QSCANl(£822,tl855,&870) 

830  IF(RECTAB(Rn.EQ.BLANK.OR.RECTAB{RI  I.EQ.STARI 
1  GO  TO  893 

RI=RI+l 

RECTABIRI )=STAR 

RSMK=RI 

AT  =  AE 

CALL  QSCANl(&822tfi855,&870) 

831  IF(MINDEX(SJ,5».EQ.CX)  GO  TO  834 
IF(MINDEX(SJ,5).EQ.0X)  GO  TO  835 
FT=1 

GO  TO  760 

834  WRITE(6,753)  (  SERCHd)  ,  I  =  l ,  30) 

GO  TO  44 

835  CM=MIN0EX(SJ,8) 

840  IF(CMEM<CM).EQ.CMNO)  GO  TO  842 

841  IF(NEXT(CM).EQ.O.ANO.DI.EQ.O)  GO  TO  847 
IF(NEXT(CM).EQ.O)  GO  TO  843 
CM=NEXT(CM) 

GO  TO  840 

842  DI=RNO(CM) 

IF(0RTAB(DI,2).EQ.FN0)  GO  tq  850 
GO  TO  841 

847  WRITE(6,848)  ( SERCHI I ) t 1=1 t 30) 

848  FORMATCIH  .3041, ‘WAS  FOUND  BUT  IS  NOT  A  MEMBER  OF* 
l/»T2t'THE  CLASS  SPECIFIED  IN  THE  QUERY:', 

2/, T2, 'RECORDS  SATISFYING  OTHER  KEYWORDS, IF  ANY,', 

3  'ARE  LISTED') 

AT«AE 

CALL  QSCANK  8822, 8855, 8870) 

843  WRITE(6,851)  { SERCHI I ) , 1=1 ,30) 

851  FORMATIIH  ,30A1,'WAS  FOUND  BUT  IS  NOT  A  MEMBER  OF » 
1/,T2,'THE  FORMAT  SPECIFIED  IN  THE  QUERY:', 
2/,T2,'IT  IS  A  MEMBER  OF:') 

1*0 


310 

315 


320 

325 

327 


33U 

333 


335 

340 


345 

350 

355 

357 

360 


60  TO  315 


CM»MIN0EX(SJ,8) 

1*1  +  1 

FTABd  )=ORTAB(RNO{CM)  ,2) 

IF(NEXT(CM).EQ.O)  GO  TO  325 
CM=NEXT(CM) 

DO  320  FI=1,30 

IF(FTAB(FI).EQ. BLANK)  GO  TO  310 
IF(FTAB(FI).EQ.ORTAB(RNO(CM) ,2) ) 

CONTINUE 
CM*MINOEX( 1,8) 

DO  330  1=1,30 

IF(FTAB(I).EQ. BLANK)  GO  TO  333 
IFIFTABd  ).EQ.RNO<CM)  )  GO  TO  335 
CC.mTINUE 

IF(NEXT(CM).EQ.O)  GO  TO  355 
CM=NEXT(CM» 

60  TO  327 
ST  CHEM(CM) 

SL-ST 

1F<SEQ(SE+1).EQ.STAR)  GO  TO  345 

SE=S£+1 

GO  TO  340 

WRITE(6,350)  (  SEQdX ) ,  IX*ST,  SE  ) 

FORMATdH  ,T2,30Al) 

GO  TO  333 
DO  357  1=1,30 
FTABd  )»BLANK 
WRITE(6,360) 

,T2, 'RECORDS  SATISFYING  OTHER  KEYWORDS, 
LISTED') 


FORMAT! Ih 

'IF  ANY, 


l/,T2, 

AT«AE 


ARE 


CALL  0SCANU8822, 8855, 8870) 

850  DO  852  ''K=RSMK,100 

IF(RECTAS(RK).EQ. BLANK)  GO  TO  853 
IFIRECTABIRK) .EQ.RNO(CM) )  GO  TO  854 


» 


, 


84 


r»o 


852 

053 

85  . 


362 


855 

857 


858 


CONTINUE 

fH  I  s|^  1^1 

RECTABCRI »=RNO(CM) 

IF<S2.EQ.l)  GO  TO  362 
RI»RI+l 

RECTAB(Rn=STAR 
RSMK=RI 
AT=AE 

CALL  QSCANK  £822,C855t£870) 

THIS  BLOCK  OF  CODE  PROCESSES  ALPHABETIC  AND/OR 
NUMERIC  RANGE  REQUESTS. 

CMNO=NODTAB(HCTR) 

SJ  =  2 

IF(MINOEX(SJ»5J .NE.DX)  GO  TO  865 
1  =  1 

IF(MINOEX(SJt6).EQ.O)  GO  TO  860 
J=MINDEX(SJi6) 

IF(SEQ(  JI.EQ.SERCHU  I.AND.SEQIJ)  .EQ.UPRANd  ) ) 
IGO  TO  859 

IF(SEQ(  J).GE.SERCH(n.AND.SEQ(  J)  .LE.UPRANd  )) 
IGO  TO  861 
60  TO  865 

861  IFCSEQIJJ.EQ.SERCHdd  GO  TO  863 
IF(SEQ(J).EQ.UPRANdn  GO  TO  866 
GO  TO  864 

859  I  =  I-H 
J=J+1 

IFISERCHd  I.EQ. BLANK)  GO  TO  864 
GO  TO  858 
863  1=1+1 
J=J+1 

IFISERCHI  n. EQ. BLANK. OR.SEQI  J)  .GE.SERCHI  I ) ) 

1  GO  TO  864 
GO  TO  865 
866  1=1+1 
J=J+1 

IF(SERCHdl.EQ.BLANK.OR.SEQ(  JJ.LE.  “'RANdd 
1  GO  TO  864 
GO  TO  865 

860  IF(MINOEX(SJ,n.EQ.SERCHd).AND.M  X(SJ,n 
-  GO  TO  862 

n.GE.SERCHd  I.AND.M 
GO  TO  871 


iX(SJ,n 


871 


862 


873 


876 


865 


GO 

GO 


864 


TO 

TO 


873 

876 


l.EQ.UPRANdd 
lF(HINOEX(SJt 
ULE.UPRANd)) 

GO  TO  865 

IF(MINOEX(SJd  I.EQ.SERCHd  )) 

IF(M1N0EX(  SJtd.EQ.UPRANdd 
GO  TO  864 
1*1  +  1 

IFCSERCHd  ).EQ. BLANK)  GO  TO 
GO  TO  860 
1*1  +  1 

IFtSERCHd  ).EQ.BLANK.OR.MINDEX(  SJf  1)  .GE.  SERCHd  ) ) 
1  GO  TO  864 
GO  TO  865 
1*1  +  1 

IFlSERCHdl.EQ.BLANK.OR.MINOEXlSJ,  I )  .LE.UPRANCI ) ) 
1  GO  TO  864 

IF*MIN0EX(SJ,1).NE. BLANK)  GO  TO  857 
Sl-0 


864 

856 

867 


868 


872 


GO  TO  830 
CK*MIN0EX(SJf8) 

IF{CHEM(CM).EQ.CMNO)  GO 
IF(NEXTICM).EQ.O)  GO  TO 
CM»NEXT(CM) 

GO  TO  856 
OI*RNO(CM) 

IF(ORTfrR(OI,2) .EQ.FNO) 

GO  TO  £.67 
00  883  RK=R$MK,100 
!FCRECTAE(RK).EQ. BLANK)  GO  TO  869 
IFIRECTABIRKI.EQ.RNOICM) )  GO  TO  865 


TO  868 
865 


GO  TO  872 
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FORMATdH  , ‘REQUEST  NOT  FULFILLED: 


QUERY' 


883  CONTINUE 

869  RI=RI+1 
RECTAB(RI)=RNO(CM) 

GO  TO  865 

- RECORD  NUMBERS  WHICH  SATISFY  INDIVIDUAL  QUERY  KEYWORDS 

- ARE  STORED  IN  A  LIST  ( ‘RECTAB* ) .TKI S  BLOCK  OF  CODE 

- SEARCHES  'RECTAB'TO  DETERMINE  WHICH  RECORDS  SATISFY 

- ALL  QUERY  KEYWORDS. 

870  IFIRI.EQ.Ol  GO  TO  893 
IFIRECTABIRI ) .EQ.STARI  GO  TO  887 
RI=RI+1 

RECTAB(Rn  =  STAR 
887  RECTAB(RI-H)*DOLS 
W=0 
RI  =  1 

TRNQ=RECTAB(RI) 

RS=1 

880  RS=RS+1 

IF(RECTAB(RSI.NE.STAR»  GO  TO  880 
RMK=RS 

881  RS=RS+1 

882  IF(RECTAB(RS).EQ.DOLS>  GO  TO  895 
885  IF(RECTAB(RS».EQ.TRNOI  GO  TO  890 

RS=RS+1 

IF(RECTAB(RS).NE.STARI  GO  TO  885 
IF(W.EQ.O)  GO  TO  893 
GO  TO  635 
890  RS=RS+1 
W=1 

IFIRECTAB(RS).EQ.STAR)  GO  TO  881 
GO  TO  890 
893  WRITE(6»892I 

892  FORMATdH  , ‘REQUEST  NOT  FULFILLED:  • 

1  'NO  RECORDS  SATISFY  THE  QUERY') 

GO  TO  44 

895  ST=0RTAB(TRN0,3) 

SE=ST 

896  IF(SEQ(SE  +  U  .EQ.STAR)  GO  TO  898 
SE=SE-H 

GO  TO  896 

898  WRITE  (6,899)  ( SEQCI ) , I=ST , SE) 

899  FORMAT  ( / , T2, 2( / ,T2, 120A1 ) ) 

RIsRI-H 

iF(RI.EQ.RMKI  GO  TO  635 
TRN0*RECTA8(RI ) 

RS*RMK+1 
GO  TO  882 

900  WRITE  (6,10) 

10  F0RHAT(1H0,T40, '♦  ♦  ♦  FLY  NAVY  ♦  ♦  ♦•) 

TERM=1 
NEXT(R)=0 
GO  TO  45 
925  WRITE(6,6) 

6  FORMATdH  , ‘ERROR:  UNBALANCED  PARENTHESIS') 

WRITE(6,7)  (WORKd  ),I  =  1,240) 

7  F0RM\T(T2,2(/,T2,120A1)) 

IF(TERM.EQ.l)  GO  TO  45 
ER«1 

GO  TO  45 
950  WRITE(6,8) 

8  FORMATdH  , ‘ERROR:  RECORD  LENGTH  EXCEEDS  240  CHARACTER 
WRITE(6,7)  (WORKd  ),I*l,240) 

IF(TERM.EO.l)  GO  TO  45 
REA0(4,2)  (W0RK(n,I  =  l,80) 

WRITE  6,2) 

ER»1 

GO  TO  45 

1000  WR!TE(6,999) 

999  F0RMAT(T2, 'PROGRAM  TERMINATION') 

STOP 

END 


GO  TO 


CHARACTER 
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SUBROUTINE  CHAR4(ARR,F4,LJ 
IMPLICIT  INTEGER*2  (A-Z) 

DIMENSION  ARR(241) ,F4(241) 

DATA  OP/M'/fCP/') ‘/tBLANK/*  '  /  »COMMA/ • » •  / 

- THIS  SUBROUTINE  STORES  THE  FIRST  FOUR  CHARACTERS  OF 

- EACH  DATA  ELEMENT  IN  THE  ARRAY  • FRST4' .DURING  DATA 

- ELEMENT  PROCESSING  EACH  CHARACTER  BLOCK  IS  MOVED  INTO 

- THE  MASTER  INDEX, IF  NOT  PREVIOUSLY  ENTERED. 

DO  5  1=1,241 
5  F4(I)=BLANK 
1=0 
J=1 

CCTR=0 
7  1=1+1 

IF(ARR(I).NE.OP)  GO  TO  7 
10  1=1+1 

IFd.GE.LI  GO  TO  60 

IF(ARR(I).EQ.OP.OR.ARR(I  I.  EQ.CP.OR.ARRin  .EQ. COMMA) 
IGO  TO  20 
25  F4(J)  =  ARR(n 
J=J+1 

CCTR=CCTR+1 
IF(CCTR-4)  10,30,30 

20  IF(ARR(  n.EQ.OP.OR.CCTR.EQ.O)  GO  TO  10 
35  F4(J)=BLANK 
J=J+1 

CCTR=CCTR+1 
IF(CCTR“4)  35,30,30 
30  CCTR=0 

IFIARRtn.NE. COMMA)  GO  TO  40 
1*1  +  1 

IFURR(I).NE.OP)  GO  TO  25 
40  1*1+1 

SF(I.GE.L)  GO  TO  60 

IF(ARR(n.NE.QP.AND.ARR(  I)  .NE.CP.ANO.ARRI  I )  .NE. COMMA) 
IGO  TO  40 
45  1*1+1 

IF(I.GE.L)  GO  TO  60 

IF(ARR(  n.EQ.OP.OR.ARRin.EQ.CP.OR.ARRID.EQ.COMMA) 
IGO  TO  45 
GO  TO  25 
60  RriURN 
LimO 


10 

20 


SUBROUTINE  INITl 
IMPLICIT  INTEGER+2  (A-Z) 

DIMENSION  TOPI 6031 ,RIGHT(602), DOWN (601) 
COMMON/ONE/TOP, AVAILl I P,Q.RR,OD,YY 
EOUIVALENCE  (TOP (3 ) .RIGHT! 2 ) .DOWN! 1 ) ) 
■THIS  SUBROUTINE  INITIALIZES  CELLS  USED  IN 
■STRUCTURES, 

DO  10  1*1,601,3 
T0P(I)»0 
RIGHT!  n  =  0 
DO  20  1*1,598,3 
DOWN! 1 1*1  +  3 
00WN!601)*0 
RETURN 
END 


TREE 


SUBROUTINE  GETl 
IMPLICIT  INTEGER*2  !A-Z) 

DIMENSION  T0P!603) ,RIGHT!602) ,00WN!601 ) 
COMMON/ONE/TOP, AVAILl, P,Q,RR, DO, YY 
EQUIVALENCE  ! TOP (3 ) ,Rt GHT! 2 ) ,OOWN! U ) 
P*Q 


P.7 


Q=OOWN(Q) 

RETURN 

END 


SUBROUTINE  INIT? 

IMPLICIT  INTFGER*2  CA-ZI 
DIMENSION  RN0(603» ,CMEM<602 ) , NEXT! 601 I 
C0MM0N/TW0/RN0iAVAIL2,S»R»CM,RM 
EQUIVALENCE  ( RNO( 3  I fCMEMI 2 ) , NEXT { 1) ) 

C - -THIS  SUBROUTINE  INITIALIZES  CLASS  MEMBERSHIP  CELLS. 

DO  10  I=l,601t3 
RNO(n=0 
10  CMEMI I  1  =  0 

DO  20  1=1,598,3 
20  NEXT! I >=I+3 
NEXT(601)=0 
RETURN 
END 


SUBROUTINE  GET2 
IMPLICIT  INTEGER*2  (A-Z> 

DIMENSION  RNO( 603 ), CMEMI 602) , NEXT! 601) 
C0MMtJN/TWQ/RN0,AVAIL2,S,R,CM,RM 
EQUIVALENCE  (RN0(3) ,CMEM(2 ) ,NEXT(1 ) ) 
R=S 

S=NEXT(S) 

RETURN 

END 


C 

C 


5 

10 

15 

16 

17 

20 

55 

90 

100 

200 


INDEX  FOR  THE 


SUBROUTINE  MISRCH 
IMPLICIT  INTEGER«2  (A-Z) 

DIMENSION  ACUM(241) 

UIMENSION  SERCHI 30) ,MI NDEX ( 500, 8 ) , SEQI 4000) 
COMMON/THREE/ANS,SERCH,MINDEX,ACUM,SJ,SK,N,SEQ 
DATA  BLANK/'  '/,STAR/'*'/ 

THIS  SUBROUTINE  SEARCHES  THE  MASTER 
WORD  CONTAINED  IN  THE  ARRAY • SERCH* . 

ANS=0 
SJ=0 
SJ=SJ+1 
SK»1 

IFIMINDEXI SJ, SKI .EQ. SERCHI I)) 

SJ*SJ+1 

IFIMINDEXISJ,SK).EQ. BLANK)  GO 
GO  TO  10 
IFIN-4)  16,16,20 
DO  17  J*2,A 

IFIMIN0EXISJ,J).NE.SERCHIJ))  GO  TO  5 
CONTINUE 
GO  TO  100 
1  =  1  +  1 

KK=MIN0EXISJ,6)+1 
IFISEQIKKl.NE.SERCHCin  GO  TO  5 
KK»KK+1 

IFISEQIKK).EQ.STAR)  GO  TO  90 


GO  TO  15 
TO  200 


I«l  +  1 

IFISEQIKKI.EQ.SERCHIII)  GO  TO  55 
GO  TO  5 
1=1  +  1 

IFISERCHID.NE. BLANK)  GO  TO  200 

ANS=1 

RETURN 

END 
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SUBROUTINE  IDENT(*,*,*» 

IMPLICIT  INTEGER*2  lA-Z) 

DIMENSION  SERCHOOl  ♦MINOEX(500,8)»SEQ(4000I  iACUM(241l 

COMMON/THREE/ANS,SERCH,MINDEX,ACUM,SJ,SK,Nf SEQ 

DATA  np/« ( '/»CP/* ) '/tCOMMA/'f ' /♦HYP/'-»/iAMP/'fi'/tCOL/ 

- THIS  SUBROUTINE  DETERMINES  WHETHER  THE  ACCUMULATOR 

- CONTAINS  AN  INPUT  RECORD  OR  A  QUERY  AND  RETURNS  TO 

- THE  APPROPRIATE  CODE  BLOCK  IN  THE  M/PROG  FOR 

- FURTHER  TESTING  AND  PROCESSING. 

F1=0 

F2=0 

F3=0 

F4=0 

DO  10  1=1, N 
IFIACUMd)  .EQ.OP)  Fl=l 
IF(ACUMCn.EQ.HYP)  F2=l 
IFIACUMd)  .EQ.COL)  F3=l 
IFIACUMdI.EQ.AMPJ  F4=l 

10  CONTINUE 

11  IF(Fl.EQ.O)  GO  TO  40 

IF(F2.EQ.0.AN0. F3.EQ.0.AND.F4.E0.0)  GO  TO  30 

IF(F3.EQ.l. OR. F4.EQ.il  GO  TO  50 

J=1 

15  IF<ACUM(J).EQ.OP)  GO  TO  17 
J=J+1 
GO  TO  15 
17  DO  20  I=J,N 

IFIACUMdI.EQ.OP.OR.ACUMdI.EQ.CP.OR.ACUMd  I.EQ.COMMA) 
IGO  TO  20 

IFIACUMd). NE. HYP)  GO  TO  50 

20  CONTINUE 

21  GO  TO  60 
30  RETURN 
40  RETURN  1 
50  RETURN  2 
60  RETURN  3 

END 


10 
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SUBROUTINE  TRAV 
IMPLICIT  INTEGER*2  lA-Z) 

DIMENSION  T0P(603) ,R IGHTI 602) . DOWN ( 601 ) 

DIMENSION  LEVEL(50),N0DTAB(50) 

DIMENSION  SERCH(30)»MIN0EX(  500 1 8 ) , SEQI 4000 1 » ACUMI 241 1 
COMMON/ONE/TOP, AVAIL l»P,Q,RRfDD,YY 
COMMON/THReE/ANS,SERCH,MINDEX,ACUM,SJi SK,N,SEQ 
COMMON/FOUR/NODTAB,FO 
EQUIVALENCE  (TOP  oKRIGHTI  2  )  ,OOWN(  1 1  ) 

DATA  LX/'L»/, BLANK/'  •/ 

-THIS  SUBROUTINE  TRAVERSES  THE  FORMAT  TREE  AND  LOCATES 
-THE  END  NODES  OF  EACH  BRANCH.  THE  CLASS  CORRESPONDING 
-TO  EACH  END  NODE  IS  STORED  IN  THE  ARRAY  'NODTAB'. 

DO  10  1-1«50 
N00TABdl*BLANK 
1*0 

AVAILl«MINDEX(SJt8) 

RR«00WN(AVAIL1) 

L»1 

LEVEL(L)«AVAIL1 

00«RR 

-EACH  LEVEL  IN  THE  TREE  IS  ASSIGNED  A  NUMBER  WHICH  IS 

-STORED  IN  A  STACK. AS  THE  TREE  IS  TRAVERSED 

-THE  STACK  IS  PUSHED  DOWN  OR  POPPED  UP  ACCORDINGLY. 


15  IFIOOWN(DD).EQ.O) 
L-L^l 

LEVEL(LI-DD 
DD^DOHNIDD) 

GO  TO  15 


GO  TO  20 
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20  I=I+l 

N00TAB( I »=TnP(0D) 

MIN0EX(10P(D0) »5J=LX 
IF(RIGHT(DD» .EQ.O)  GO  TO  25 
0D=RIGHT(DD» 

GO  TO  15 

25  IF(LEVEL<L).EQ.RR)  GO  TO  ^0 

IF(LEVEL(L).EQ.AVAILn  GO  TO  35 
D0=LEVEL(L) 

IF(RIGHT{D0)  GO  TO  27 

L  =  L-l 
GO  TO  25 
27  DD=RIGHT(DD) 

L=L-1 
GO  TO  15 

30  IF(RIGHT(RR) .EQ.OI  GO  TO  35 
RR=RIGHT(RR» 

IF(DOWN(RR).NE.O)  GO  TO  12 
1  =  1  +  1 

NODTAB(I»=TOP(RRI 
MINDEX<TOP(RR» ,5J=LX 
GO  TO  30 
35  RETURN 
END 


SUBROUTINE  WENT 
IMPLICIT  INTEGER+2  (A-ZI 

DIMENSION  SERCHOOI  ,MIN0EX(  500,8), SEQ(40^  ),ACUM(241) 
COMMON/THREE/ANS,SERCH,MINDEX,ACUM,SJ,SK,N,SEQ 
COMMON/FI  VE/MULT, MK,EK 
DATA  OP/*  ( '/,CP/M  •/,STAR/'*'/ 

- 'MK' SCANS  THE  DATA  ELEMENT  FROM  ITS  FIRST  'OP* 

- (COUNTING  ‘OP’S'UNTIL  THE  FIRST  CHARACTER  IS 

- ENCOUNTERED. THEN  ‘EK'SCANS  FROM  'MK*  (COUNTING  'CP'S* 

- UNTIL  AN  'OP'IS  ENCOUNTERED.  IF  THE  PARENTHESES  AR^; 

- UNBAL'NCED  WHEN  'EK* STOPS  THEN  A  MULTIPLE  ENTRY  EXISTS 

PCTR  =  v. 

MULT»0 

5  IF(ACUM(MK).EQ.OP)  GO  TO  10 

IF(ACUM(MK).EQ.STAR)  GO  TO  30 
GO  TO  5 
10  PCTR=PCTR+1 
MK=MK+1 

IF(ACUM(MK).EQ.STAR|  GO  TO  30 
1F(ACUM(MK),EQ.0P)  GO  TO  10 
15  EK«MK 
20  EK=EK+1 

’F(ACUM(EK).EQ.CP)  PCTR=PCTR-1 
.F(ACUH(EK).EQ.STAR|  GO  TO  30 
IF(ACUM(EK).EQ.OP)  GO  TO  25 
GO  TO  20 

25  IF(PCTR.NE.O)  MULT=l 
30  RETURN 
END 


1 


DIMENSION  UPRAN(30I 

DIMENSION  SERCH(30) ,MI NOEX ( 500, 8) , SEQ( 4000) , ACUM( 241) 

COMMr';/THRFE/ANS,SERCH,MINOEX,ACUM,SJ,SK,N,SE0 

C0MM(if>/SlX/S2,AT,AE,HCTR,UPRAN 

DATA  OP/' ( '/.dp/*) •/, COMMA/' ,*/, BLANK/ •  * / ,HYP/*- • /,ST 
COL/*:*/,AMP/*e*/ 

THIS  SUBROUTINE  LOCATES  KEYWORDS  IN  THE  QUERY, 
DETERMINES  IF  ALPHABETIC/NUMERIC  RANGES  ARE  REQUESTED, 
LOADS  THE  •  SERCH* ARRAY, AND  RETURNS  TO  THE  M/PROG  FOR 
QUERY  PROCESSING. 
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AT=N+1 

IF(ACUM(ATJ.NE.OP»  GO  TO  15 

AT=AT+1 

GO  TO  10 

HCTR=0 

IF(ACUM(ATKEQ,HYP)  GO  TO  40 
AE  =  AT 

IFCS2.EQ,1)  GO  TO  22 

IFtSl.EQ.O}  HCTR=HCTR+1 

DO  25  I=l»30 

SERCH( I »=BLANK 

UPRAM( I »=BLANK 

CONTINUE 

N=1 

SERCH(N»=ACUM( AEI 
AE=AE+1 

IF(ACUM(AE). NE. COMMA. AND. ACUM(AE).NE. CP)  GO  TO  32 

S1=0 

S2*0 

GO  TO  50 

1F(ACUM<AE).NE.C0L)  GO  TO  34 

ARRAY  *SERCH'  IS  LOADED  WITH  THE  LOWER  RANGE  LIMIT 

AND  ARRAY  'UPRAN  IS  LOADED  WITH  THE  UPPER  RANGE  LIMIT, 

Sl=l 

UI»1 

AE=AE+1 

UPRANIUi l=ACUM(AE) 

AEsAE+l 

IF(ACUM(AE).EQ.COMMA.OR. ACUM(AE) .EQ.CP)  GO  TO  60 

UI=UI+1 

GO  TO  31 

IF(ACUM(AE).EQ.AMP)  GO  TO  35 
N=N4  1 
GO  TO  30 
S2=l 

GO  TO  50 
HCTR=HCTR-H 
ENTRY  QSCANK*, 

AT»AT41 

IF(ACUW(AY).EQ.STAR|  GO  TO  70 

IF(ACUM<AT).EQ.OP.OR.ACUM(AT).EQ.COMMA.OR-ACUMIAT).EQ. 
I  GO  TO  45 

IF(ACUM(AT).EQ.HYP)  GO  TO  40 

GO  TO  20 

RETURN  1 

RETURN  2 

RETURN  3 

END 
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12.  BPONSORINC  MILITARY  ACTIVITY 
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Monterey,  California  93940 


An  on-line,  general-purpose  fact-retrieval  system  is  presented  which 
employs  a  classificatory  data  structuring  technique.  The  technique  embraces 
the  basic  concept  of  hierarchical  classification  of  data  and  provides  users 
with  multiple  anenues  of  access  to  a  data  file.  Additionally,  the  data  file  may 
be  partitioned  into  unrelated  data  sets. 
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